Examensarbete Tensor Rank Elias Erdtman, Carl Jönsson

Home , Multilinear map

Examensarbete

Tensor Rank Elias Erdtman, Carl J¨onsson

LiTH - MAT - EX - - 2012/06 - - SE

Tensor Rank

Applied Mathematics, Link¨opings Universitet

Elias Erdtman, Carl J¨onsson

LiTH - MAT - EX - - 2012/06 - - SE

Examensarbete: 30 hp

Level: A

Supervisor: G¨oranBergqvist, Applied Mathematics, Link¨opings Universitet

Examiner: Milagros Izquierdo Barrios, Applied Mathematics, Link¨opings Universitet

Link¨oping June 2012

Abstract

This master’s thesis addresses numerical methods of computing the typical ranks of tensors over the real numbers and explores some properties of tensors over finite fields. We present three numerical methods to compute typical tensor rank. Two of these have already been published and can be used to calculate the lowest typical ranks of tensors and an approximate percentage of how many tensors have the lowest typical ranks (for some tensor formats), respectively. The third method was developed by the authors with the intent to be able to discern if there is more than one typical rank. Some results from the method are presented but are inconclusive. In the area of tensors over finite fields some new results are shown, namely that there are eight GLq(2) × GLq(2) × GLq(2)-orbits of 2 × 2 × 2 tensors over any finite field and that some tensors over Fq have lower rank when considered as tensors over Fq2 . Furthermore, it is shown that some symmetric tensors over F2 do not have a symmetric rank and that there are tensors over some other finite fields which have a larger symmetric rank than rank.

Keywords: generic rank, symmetric tensor, tensor rank, tensors over ﬁnite ﬁelds, typical rank. URL for electronic version:

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-78449

Erdtman, J¨onsson,2012. v vi Preface

“Tensors? Richard had no idea what a tensor was, but he had noticed that when math geeks started throwing the word around, it meant that they were headed in the general direction of actually getting something done.” - Neal Stephenson, Reamde (2011).

This text is a master’s thesis, written by Elias Erdtman and Carl Jönsson at Linköpingsuniversitet, with GöranBergqvist as supervisor and Milagros Izquierdo Barrios as examiner, in 2012.

Background

The study of tensors of order greater than two has recently had an upswing, both from a theoretical point of view and in applications, and there are lots of unanswered questions in both areas. Questions of interest are for example what a generic tensor looks like, what are useful tensor decompositions and how can one calculate them, what are and how can one ﬁnd equations for sets of tensors, etc. Basically one wants to have a theory of tensors as well-developed and easy to use as the theory of matrices.

Purpose

In this thesis we aim to show some basic results on tensor rank and investigate methods for discerning generic and typical ranks of tensors, i.e., searhing for an answer to the question, which ranks are the most ”common”?.

Chapter outline

Chapter 1. Introduction In the first chapter we present theory relevant to tensors. It is divided in four major parts: the first part is about multilinear algebra, the second part is a short introduction to the CP decomposition, the third part gives the reader the background in algebraic geometry necessary to understand the results in chapter 2. The fourth and last part of the chapter gives an example of the application of tensor decomposition, more specifically the multiplication tensor for 2 × 2 matrices and Strassen’s algorithm for matrix multiplication.

Erdtman, J¨onsson,2012. vii viii

Chapter 2. Tensor rank In the second chapter we introduce different notions of rank: tensor rank, multilinear rank, Kruskal rank, etc. We show some basic results on tensors using algebraic geometry, among them some results on generic ranks over C and typical ranks over R. Chapter 3. Numerical methods and results Numerical results for determining typical ranks are presented in chapter three. We present an algorithm which can calculate the generic rank for any format of tensor spaces and another algorithm from which one can infer if there is more than one typical rank over R for some tensor space formats. A method developed by the authors is also presented along with results giving an indication that the method does not seem to work. Chapter 4. Tensors over finite fields This chapter contains some results on finite fields. We present a classi- fication and the sizes of the eight GLq(2) × GLq(2) × GLq(2)-orbits of 2 2 2 Fq ⊗ Fq ⊗ Fq and show that the elements of one of the orbits have lower rank when considered as tensors over Fq2 . Finally we show that there are symmetric tensors over F2 which do not have a symmetric rank and over some other finite fields a symmetric tensor can have a symmetric rank which is greater than its rank. Chapter 5. Summary and future work The results of the thesis are summarized and some directions of future work are indicated.

Appendix A. Programs Program code for Mathematica or MATLAB used to produce the results in the thesis is given in this appendix.

Distribution of work

Since this is a master’s thesis we give account for who has done what in the table below. Section Author 1.1 CJ/EE 1.2 EE 1.3 CJ 1.4 CJ/EE 2.1 CJ/EE 2.2-2.4 CJ 3.1-3.2 EE/CJ 3.3 EE 4 CJ 5 CJ & EE Nomenclature

Most of the reoccurring abbreviations and symbols are described here.

Symbols

• F is a ﬁeld.

• Fq is the ﬁnite ﬁeld of q elements. •I(V ) is the ideal of an algebraic set V .

•V(I) is the algebraic set of zeros to an ideal I. • Seg is the Segre mapping.

• σr(X) is the r:th secant variety of X.

• Sd is the symmetric group on d elements. • ⊗ is tensor product.

• ~ is the matrix Kronecker product. • Xˆ is the aﬃne cone to a set X ∈ PV . • dxe is the number x rounded up to the nearest integer.

Erdtman, J¨onsson,2012. ix x Contents

1 Introduction 1 1.1 Multilinear algebra ...... 2 1.1.1 Tensor products and multilinear maps ...... 2 1.1.2 Symmetric and skew-symmetric tensors ...... 5 1.1.3 GL(V1) × · · · × GL(Vk) acts on V1 ⊗ · · · ⊗ Vk ...... 7 1.2 Tensor decomposition ...... 7 1.3 Algebraic geometry ...... 9 1.3.1 Basic deﬁnitions ...... 9 1.3.2 Varieties and ideals ...... 10 1.3.3 Projective spaces and varieties ...... 11 1.3.4 Dimension of an algebraic set ...... 12 1.3.5 Cones, joins, and secant varieties ...... 14 1.3.6 Real algebraic geometry ...... 15 1.4 Application to matrix multiplication ...... 16

2 Tensor rank 19 2.1 Diﬀerent notions of rank ...... 19 2.1.1 Results on tensor rank ...... 21 2.1.2 Symmetric tensor rank ...... 21 2.1.3 Kruskal rank ...... 22 2.1.4 Multilinear rank ...... 23 2.2 Varieties of matrices over C ...... 23 2.3 Varieties of tensors over C ...... 24 2.3.1 Equations for the variety of tensors of rank one ...... 24 2.3.2 Varieties of higher ranks ...... 25 2.4 Real tensors ...... 27

3 Numerical methods and results 29 3.1 Comon, Ten Berge, Lathauwer and Castaing’s method ...... 29 3.1.1 Numerical results ...... 32 3.1.2 Discussion ...... 32 3.2 Choulakian’s method ...... 33 3.2.1 Numerical results ...... 35 3.2.2 Discussion ...... 37 3.3 Surjectivity check ...... 37 3.3.1 Results ...... 39 3.3.2 Discussion ...... 40

Erdtman, J¨onsson,2012. xi xii Contents

4 Tensors over finite fields 43 4.1 Finite fields and linear algebra ...... 43 2 2 2 4.2 GLq(2) × GLq(2) × GLq(2)-orbits of Fq ⊗ Fq ⊗ Fq ...... 44 4.2.1 Rank zero and rank one orbits ...... 48 4.2.2 Rank two orbits ...... 48 4.2.3 Rank three orbits ...... 50 4.2.4 Main result ...... 51 4.3 Lower rank over field extensions ...... 52 4.4 Symmetric rank ...... 52

5 Summary and future work 57 5.1 Summary ...... 57 5.2 Future work ...... 57

A Programs 59 A.1 Numerical methods ...... 59 A.1.1 Comon, Ten Berge, Lathauwer and Castaing’s method . . 59 A.1.2 Choulakian’s method ...... 60 A.1.3 Surjectivity check ...... 61 A.2 Tensors over ﬁnite ﬁelds ...... 64 A.2.1 Rank partitioning ...... 64 A.2.2 Orbit paritioning ...... 66

Bibliography 68 List of Tables

3.1 Known typical ranks for 2 × N2 × N3 arrays over R...... 33 3.2 Known typical ranks for 3 × N2 × N3 arrays over R...... 33 3.3 Known typical ranks for 4 × N2 × N3 arrays over R...... 34 3.4 Known typical ranks for 5 × N2 × N3 arrays over R...... 34 3.5 Known typical ranks for N ×d arrays over R...... 34 3.6 Number of real solutions to (3.7) for 10 000 random 5 × 3 × 3 tensors...... 36 3.7 Number of real solutions to (3.7) for 10 000 random 7 × 4 × 3 tensors...... 36 3.8 Number of real solutions to (3.7) for 10 000 random 9 × 5 × 3 tensors...... 36 3.9 Number of real solutions to (3.7) for 10 000 random 10 × 4 × 4 tensors...... 36 3.10 Number of real solutions to (3.7) for 10 000 random 11 × 6 × 3 tensors...... 36 3.11 Approximate probability that a random I × J × K tensor has rank I...... 36 3.12 Euclidean distances depending on the fraction of the area on the n-sphere...... 40 3.13 Number of points from φ2 close to some control points for the 2 × 2 × 2 tensor...... 40 3.14 Number of points from φ3 close to some control points for the 2 × 2 × 3 tensor...... 41 3.15 Number of points from φ3 close to some control points for the 2 × 3 × 3 tensor...... 41 3.16 Number of points from φ5 close to some control points for the 3 × 3 × 4 tensor...... 41

2 2 2 4.1 Orbits of Fq ⊗ Fq ⊗ Fq under the action of GLq(2) × GLq(2) × GLq(2) for q = 2, 3...... 46 2 2 2 4.2 Orbits of Fq ⊗Fq ⊗Fq under the action of GLq(2)×GLq(2)×GLq(2). 52 4.3 Number of symmetric 2 × 2 × 2-tensors generated by symmetric rank one tensors over some small ﬁnite ﬁelds...... 53 4.4 Number of N ×N ×N symmetric tensors generated by symmetric rank one tensors over F2...... 55

Erdtman, J¨onsson,2012. xiii xiv List of Tables List of Figures

1.1 The image of t 7→ (t, t2, t3) for −1 ≤ t ≤ 1...... 10 1.2 The intersection of the surfaces deﬁned by y−x2 = 0 and z−x3 = 0, namely the twisted cubic, for (−1, 0, −1) ≤ (x, y, z) ≤ (1, 1, 1). 11 1.3 The cuspidal cubic...... 13 1.4 An example of a semi-algebraic set...... 16

3.1 Connection between Euclidean distance and an angle on a 2- dimensional intersection of a sphere...... 39

Erdtman, J¨onsson,2012. xv xvi List of Figures Chapter 1

Introduction

This first chapter will introduce basic notions, definitions and results concerning multilinear algebra, tensor decomposition, tensor rank and algebraic geometry. A general reference for this chapter is [25]. The simplest way to look at tensors is as a generalization of matrices; they are objects in which one can arrange multidimensional data in a natural way. For instance, if one wants to analyze a sequence of images with small differ- ences in some property, e.g. lighting or facial expression, one can use matrix decomposition algorithms, but then one has to vectorize the images and lose the natural structure. If one could use tensors, one can keep the natural structure of the pictures and it will be a significant advantage. However, the problem then becomes that one needs new results and algorithms for tensor decomposition. The study of decomposition of higher order tensors has its origins in arti- cles by Hitchcock from 1927 [19, 20]. Tensor decomposition was introduced in psychometrics by Tucker in the 1960’s [41], and in chemometrics by Appellof and Davidson in the 1980’s [2]. Strassen published his algorithm for matrix multiplication in 1969 [37] and since then tensor decomposition has received at- tention in the area of algebraic complexity theory. An overview of the subject, its literature and applications can be found in [1, 24]. Tensor rank, as introduced later in this chapter, is a natural generalization of matrix rank. Kruskal [23] states that is so natural that it was introduced independently at least three times before he introduced it himself in 1976. Tensors have recently been studied from the viewpoint of algebraic geometry, yielding results on typical ranks, which are the ranks a random tensor takes with non-zero probability. The recent book [25] summarizes the results in the field. Results often concern the typical ranks of certain formats of tensors, methods for discerning the rank of a tensor or algorithms for computing tensor decompositions. Algorithms for tensor decompositions are often of interest in applications areas, where one wants to find structures and patterns in data. In some cases, just finding a decomposition is not enough, one wants the decomposition to be essentially unique. In these cases one wants an algorithm to find a decomposition of a tensor and some way of determining if it is unique. In other fields of applications, one wants to find decompositions of important tensors, since this will yield better performing algorithms in the field, e.g. Strassen’s algorithm. Of course, an algorithm for finding a decomposition would be of high interest also in this case, but uniqueness is not important. However, in this case, just

Erdtman, Jönsson,2012. 1 2 Chapter 1. Introduction knowing that a tensor has a certain rank gives one the knowledge that there is a better algorithm, but if the decomposition is the important part, just knowing the rank is of little help. We take a look at efficient matrix multiplication and Strassen’s algorithm as an example application in the end of the chapter. There are other examples of applications of tensor decomposition and rank, e.g. face recognition in the area of pattern recognition, modeling fluorescence excitation-emission data in chemistry, blind deconvolution of DS-CDMA signals in wireless communications, Bayesian networks in algebraic statistics, tensor network states in quantum information theory [25] and in neuroscience tensors are used in the study of effects of new drugs on brain activity [1, 24]. Efficient matrix multiplication is a special case of efficient evaluation of bilinear forms, see [22, 21, section 4.6.4 pp. 506-524], which, among other things, is studied in algebraic complexity theory [9, 25, chapter 13]. Historically, tensors over R and C have been investigated. In chapter 4, we investigate tensors over finite fields and show some new results.

1.1 Multilinear algebra

In this section we introduce the basics of multilinear algebra, which is an exten- sion of linear algebra by expanding the domain from one vector space to several. For an easy introduction to tensor products of vector spaces see [42].

1.1.1 Tensor products and multilinear maps Definition 1.1.1 (Dual space, dual basis). For a vector space V over the field F, the dual space V ∗ of V is the vector space of all linear maps V → F. ∗ If {v1, v2,..., vn} is a basis for V the dual basis {α1, α2, . . . , αn} in V is defined by ( 1 i = j αi(vj) = 0 i 6= j and extending linearly. Theorem 1.1.2. If V is of finite dimension, the dual basis is a basis of V ∗. Furthermore, V ∗ is isomorphic to V . The dual of the dual, (V ∗)∗ is naturally isomorphic to V . Definition 1.1.3 (Tensor product). For vector spaces V,W we define the tensor product V ⊗ W to be the vector space of all expressions of the form

v1 ⊗ w1 + ··· + vk ⊗ wk where vi ∈ V, wi ∈ W and the following equalities hold for the operator ⊗: • λ(v ⊗ w) = (λv) ⊗ w = v ⊗ (λw).

• (v1 + v2) ⊗ w = v1 ⊗ w + v2 ⊗ w.

• v ⊗ (w1 + w2) = v ⊗ w1 + v ⊗ w2. i.e., (· ⊗ ·) is linear in both arguments. 1.1. Multilinear algebra 3

Since V ⊗ W is a vector space, we can iteratively form tensor products V1 ⊗ V2 ⊗ · · · ⊗ Vk of an arbitrary number of vector spaces V1,V2,...,Vk. An element of V1 ⊗ V2 ⊗ · · · ⊗ Vk is said to be a tensor of order k.

nV nW Theorem 1.1.4. If {vi}i=1 and {wj}j=1 are bases for V and W respectively, nV ,nW then {vi ⊗wj}i=1,j=1 is a basis for V ⊗W and dim(V ⊗W ) = dim(V ) dim(W ). Proof. Any T ∈ V ⊗ W can be written

n X T = ak ⊗ bk k=1 for ak ∈ V, bk ∈ W . Since vi and wj are bases, we can write

n n XV XW ak = akivi bk = bkjwj i=1 j=1 and thus

n n ! n  X XV XW T = akivi ⊗  bkjwj = k=1 i=1 j=1 n n n X XV XW = akibkjvi ⊗ wj = k=1 i=1 j=1 n n n ! XV XW X = akibkj vi ⊗ wj i=1 j=1 k=1

nV ,nW so that {vi ⊗wj}i=1,j=1 is a basis follows, and this in turn implies dim(V ⊗W ) = dim(V ) dim(W ).

If {v(i)}ni is a basis for V , this implies that {v(1)⊗v(2)⊗· · ·⊗v(k)}n1,...,nk j j=1 i j1 j2 jk j1=1,...,jk=1 is a basis for V1 ⊗ V2 ⊗ · · · ⊗ Vk. Furthermore, if we have chosen a basis for each Vi, we can identify a tensor T ∈ V1 ⊗ V2 ⊗ · · · ⊗ Vk with a k-dimensional array of size dim V1 × dim V2 × · · · × dim Vk where the element in position (j1, j2, . . . , jk) is the coeﬃcient for v(1) ⊗ v(2) ⊗ · · · ⊗ v(k) in the expansion of T in the induced j1 j2 jk basis for V1 ⊗ V2 ⊗ · · · ⊗ Vk. If k = 2, one gets matrices. If one describes a third order tensor as a three-dimensional array, one can describe the tensor as a tuple of matrices. For example, say the I × J × K tensor T has the entries tijk in its array. Then T can be described as the J,K tuple (T1,T2,...TI ) where Ti = (tijk)j=1,k=1, but it can also be described as 0 0 0 00 00 00 0 I,K the tuples (T1,T2,...,TJ ) or (T1 ,T2 ,...,TK ), where Tj = (tijk)i=1,k=1 and 00 I,J Ti = (tijk)i=1,j=1. The matrices in the tuples are called the slices of the array. Sometimes the adjectives frontal, horizontal and lateral are used to distinguish the diﬀerent kinds of slices.

2 Example 1.1.5 (Arrays). Let {e1, e2} be a basis for R . Then e1 ⊗ e1 + 2 2 2e1 ⊗ e2 + 3e2 ⊗ e1 ∈ R ⊗ R can be expressed as the matrix 1 2 . 3 0 4 Chapter 1. Introduction

The third order tensor e1 ⊗e1 ⊗e1 +2e1 ⊗e2 ⊗e2 +3e2 ⊗e1 ⊗e2 +4e2 ⊗e2 ⊗e2 ∈ R2 ⊗ R2 ⊗ R2 can be expressed as a 3-dimensional array: 1 0 0 2 0 0 3 4 and the slices of the array are

1 0 0 2 , 0 0 3 4 1 0 0 2 , 0 3 0 4 1 0 0 0 , 0 2 3 4 where each pair arises from a diﬀerent way of cutting the tensor.

Deﬁnition 1.1.6 (Tensor rank). The smallest R for which T ∈ V1 ⊗ · · · ⊗ Vk can be written R X (1) (k) T = vr ⊗ · · · ⊗ vr , (1.1) r=1 (i) for arbitrary vectors vr ∈ Vi is called the tensor rank of T .

Deﬁnition 1.1.7 (Multilinear map). Let V1,...Vk be vector spaces over F.A map f : V1 × · · · × Vk → F, is a multilinear map if f is linear in each factor Vi.

Theorem 1.1.8. The set of all multilinear maps V1 × · · · × Vk → F can be ∗ ∗ identiﬁed with V1 ⊗ · · · ⊗ Vk . (i) (i) Proof. Let Vi have dimension ni and basis {v1 ,..., vni }, and let the dual basis (i) (i) ∗ ∗ be {α1 , . . . , αni }. Then f ∈ V1 ⊗ · · · ⊗ Vk can be written

X (1) (k) f = β α ⊗ ... ⊗ α i1,...,ik i1 ik i1,...,ik and for (u1,..., uk) ∈ V1 × · · · × Vk acts as a multilinear mapping by:

X (1) (k) f(u ,..., u ) = β α (u ) ··· α (u ). 1 k i1,...,ik i1 1 ik k i1,...,ik

Conversely, let f : V1 × · · · × Vk → F be a multilinear mapping. Pick a basis (i) (i) (i) (i) {v1 ,..., vni } for Vi and let the dual basis be {α1 , . . . , αni }. Deﬁne

β = f(v(1),..., v(k)) i1,...,ik i1 ik and thus X (1) (k) β α ⊗ ... ⊗ α ∈ V ∗ ⊗ · · · ⊗ V ∗ i1,...,ik i1 ik 1 k i1,...,ik f acts as the multilinear map f by the description above. 1.1. Multilinear algebra 5

∗ A multilinear mapping (V1 ×· · ·×Vk)×W → F can be seen as an element of ∗ ∗ V1 ⊗· · ·⊗Vk ⊗W and can also be seen as a map V1 ⊗· · ·⊗Vk → W . Explicitly, ∗ P (1) (k) if f :(V1 × · · · × Vk) × W → F is written f = i αi ⊗ · · · ⊗ αi ⊗ wi it acts ∗ on an element in V1 × · · · × Vk × W by

X (1) (k) f(v1,..., vk, β) = αi (v1) ··· αi (vk)wi(β) ∈ F i but it can also act on an element in V1 × · · · × Vk by

X (1) (k) f(v1,..., vk) = αi (v1) ··· αi (vk)wi ∈ W. i Example 1.1.9 (Linear maps). Given two vector spaces V,W the set of all ∗ Pn linear maps V → W can be identiﬁed with V ⊗ W . If f = i=1 αi ⊗ wi, f acts as a linear map V → W by

n X f(v) = αi(v)wi i=1 or, going in the other direction, if f is a linear map f : V → W , we can describe ∗ it as a member of V ⊗ W by taking a basis {v1, v2,..., vn} for V and its dual basis {α1, α2, . . . , αn} and setting wi = f(vi), so we get

n X f = αi ⊗ wi. i=1

1.1.2 Symmetric and skew-symmetric tensors Two important subspaces of second order tensors V ⊗ V are the symmetric tensors and the skew-symmetric tensors. First, deﬁne the map τ : V ⊗V → V ⊗V by τ(v1 ⊗ v2) = v2 ⊗ v1 and extending linearly (τ can be interpreted as the non-trivial permutation on two elements). The spaces of symmetric tensors, S2V , and skew-symmetric tensors, Λ2V , can then be deﬁned as:

S2V := span{v ⊗ v | v ∈ V } = = {T ∈ V ⊗ V | τ(T ) = T }, Λ2V := span{v ⊗ w − w ⊗ v| v, w ∈ V } = = {T ∈ V ⊗ V | τ(T ) = −T }.

Let us define two operators that give the symmetric and anti-symmetric part of a second order tensor. For v1, v2 ∈ V , define the symmetric part of v1 ⊗ v2 to 1 2 be v1v2 = 2 (v1 ⊗v2 +v2 ⊗v1) ∈ S V and the anti-symmetric part of v1 ⊗v2 to 1 2 be v1 ∧ v2 = 2 (v1 ⊗ v2 − v2 ⊗ v1) ∈ Λ V and we have v1 ⊗ v2 = v1v2 + v1 ∧ v2. To expand the definition of symmetric and skew-symmetric tensor, over R and C, to higher order we need to generalize these operators. Denote the tensor product of the same vector space k times as V ⊗k. Then for the symmetric case ⊗k ⊗k the map πS : V → V is defined on rank-one tensors by 1 X π (v ⊗ · · · ⊗ v ) = v ⊗ · · · ⊗ v = v v ··· v , S 1 k k! τ(1) τ(k) 1 2 k τ∈Sk 6 Chapter 1. Introduction

where Sk is the symmetric group on k elements. ⊗k ⊗k For the skew-symmetric tensors the map πΛV → V is deﬁned on rank- one elements by

1 X π (v ⊗ · · · ⊗ v ) = sgn(τ)v ⊗ · · · ⊗ v = v ∧ · · · ∧ v . Λ 1 k k! τ(1) τ(k) 1 k τ∈Sk

πS and πΛ are then extended linearly to act on the entire space.

Deﬁnition 1.1.10 (SkV, ΛkV ). Let V be a vector space. The space of symmetric tensors SkV is deﬁned as

k ⊗k S V = πS(V ) = ⊗k = {X ∈ V | πS(X) = X}.

The space of skew-symmetric tensors or alternating tensors is deﬁned as

k ⊗k Λ V = πΛ(V ) = ⊗k = {X ∈ V | πΛ(X) = X}.

The space SkV ∗ can be seen as the space of symmetric k-linear forms on V , but also as the space of homogeneous polynomials of degree k on V , so we can identify homogeneous polynomials of degree k with symmetric k-linear forms. We do this through a process called polarization.

Theorem 1.1.11 (Polarization identity). Let f be a homogeneous polynomial of degree k. Then

! 1 X X f¯(x , x , . . . , x ) = (−1)k−|I|f x 1 2 k k! i I⊂[k],I6=∅ i∈I is a symmetric k-linear form. Here [k] = {1, 2, . . . , k}.

Example 1.1.12. Let P (s, t, u) be a cubic homogenous polynomial in three variables. Plugging this into the polarization identity yields the folowing multilinear form:

s  s  s  1 2 3 1 P¯ t , t , t = [P (s + s + s , t + t + t , u + u + u )  1   2   3  3! 1 2 3 1 2 3 1 2 3 u1 u2 u3

− P (s1 + s2, t1 + t2, u1 + u2) − P (s1 + s3, t1 + t3, u1 + u3)

− P (s2 + s3, t2 + t3, u2 + u3) + P (s1, t1, u1) + P (s2, t2, u2) + P (s3, t3, u3)]

For example, if P (s, t, u) = stu one gets

1 P¯ = (s t u + s t u + s t u + s t u + s t u + s t u ) . 6 1 2 3 1 3 2 2 1 3 2 3 1 3 1 2 3 2 1 1.2. Tensor decomposition 7

1.1.3 GL(V1) × · · · × GL(Vk) acts on V1 ⊗ · · · ⊗ Vk

GL(V ) is the group of invertible linear maps V → V . An element (g1, g2, . . . , gk) ∈ GL(V1) × · · · × GL(Vk) acts on an element v1 ⊗ v2 ⊗ · · · ⊗ vk ∈ V1 ⊗ · · · ⊗ Vk by

(g1, g2, . . . gk) · (v1 ⊗ · · · ⊗ vk) = g1(v1) ⊗ · · · ⊗ gk(vk) and on the whole space V1 ⊗ · · · ⊗ Vk by extending linearly. (i) ni If one picks a basis for each V1,...,Vk, say {vj }j=1 is a basis for Vi, one can write ni (i) X (i) (i) gi(vj ) = αj,l vl , (1.2) l=1 and if T ∈ V1 ⊗ · · · ⊗ Vk,

X (1) (k) T = β v ⊗ · · · ⊗ v . (1.3) j1,...,jk j1 jk j1,...,jk

Thus, if g = (g1, . . . , gk),

X (1) (k) g · T = β g (v ) ⊗ · · · ⊗ g(v ) = j1,...,jk 1 j1 jk j1,...,jk X X (1) (k) (1) (k) = β α ··· α v ⊗ · · · ⊗ v = j1,...,jk j1,l1 jk,lk l1 lk j1,...,jk l1,...,lk   X X (1) (k) (1) (k) = β α ··· α v ⊗ · · · ⊗ v . (1.4)  j1,...,jk j1,l1 jk,lk  l1 lk l1,...,lk j1,...,jk

One can note that the α’s in (1.2) gives the matrix of gi, and that the β’s in (1.3) gives the tensor T as a k-dimensional array. Thus the scalars

X (1) (k) β α ··· α j1,...,jk j1,l1 jk,lk j1,...,jk in (1.4) gives the coeﬃcients in the k-dimensional array representing g · T .

1.2 Tensor decomposition

Let us start to consider how factorisation and decomposition works for tensors of order two, in other word matrices. Depending on the application and the resources for calculation, diﬀerent decompositions are used. A very important decomposition is the singular value decomposition (SVD). It decomposes a matrix M into a sum of outer products (tensor products) of vectors as

R R X T X M = σrurvr = σrur ⊗ vr. r=1 r=1

Here ur and vr are pairwise orthonormal vectors, σr are the singular values and R is the rank of the matrix M, and these conditions make the decomposition essentially unique. The rank of M is the number of non-zero singular values and the best low rank-approximations of M are given by truncating the sum. 8 Chapter 1. Introduction

For tensors of order greater than two the situation is different. A decomposition that is a generalization of the SVD, but not of all its properties, is called CANDECOMP (canonical decomposition), PARAFAC (parallel factors analysis) or CP decomposition [24]. It is also the sum of tensor products of vectors as the following: R X (1) (k) T = vr ⊗ · · · ⊗ vr , r=1 (j) where Vj are vector spaces and vi ∈ Vj. As one can see the CP decomposition is used to define the rank of a tensor, where R is the rank of T if R is the smallest possible number such that equality holds (definition 1.1.6). A big issue with higher order tensors is that there is no method or algorithm to calculate the CP decomposition exactly, which would also give the rank of a tensor. A common algorithm to calculate the CP decomposition is the alternating least square (ALS) algorithm. It can be summarized as a least square method where we let the values from one vector space change while the others are fixed. Then the same is done for the next vector space and so forth for all vector spaces. If the difference between the approximation and the given tensor is too large the whole procedure is repeated until the difference is small enough. The algorithm is described in algorithm 1 where T is a tensor of size d1 × · · · × dN . The norm that is used is the Frobenius norm, and it is defined as

d1,...,dN 2 X 2 kT k = |Ti1,...,iN | , (1.5)

i1=1,...,iN =1 where Ti1,...,iN denotes the i1, . . . , iN component of T . One thing to notice is that the rank is needed as a parameter for the calculations, so if the rank is not known it needs to be approximated before the algorithm can start.

Algorithm 1 ALS algorithm to calculate the CP decomposition Require: T,R (n) d Initialize ar ∈ R n for n = 1,...,N and r = 1,...,R. repeat for n = 1,. . . ,N do R 2 X (1) (N) Solve min T − ar ⊗ · · · ⊗ ar . (n) ai ,i=1,...,R r=1 (n) Update ai to its newly calculated value, for i = 1,...R. end for 2 PR (1) (N) until T − r=1 ar ⊗ · · · ⊗ ar < threshold or maximum iteration is reached (1) (N) return ar ,... ar for r = 1,...,R.

This is actually a way to decide the rank of a tensor, but the method has a few problems. First of all is the issue with border rank (see section 2.1), which makes it possible to approximate some tensors arbitrary well with tensors with lower rank (see example 2.1.1). Furthermore the algorithm is not guaranteed to converge to a global optimum, and even if it does converge, it might need a large number of iterations [24]. 1.3. Algebraic geometry 9

1.3 Algebraic geometry

In this section we introduce basic notions of algebraic geometry, which is the study of objects deﬁned by polynomial equations. References for this section are [13, 17, 25, 31], and for section 1.3.6, [6].

1.3.1 Basic deﬁnitions

Deﬁnition 1.3.1 (Monomial). A monomial in variables x1, x2, . . . , xn is a product of variables

α1 α2 αn x1 x2 . . . xn

α where αi ∈ N = {0, 1, 2,... }. Another notation for this is x where x = n (x1, x2, . . . , xn) and α = (α1, α2, . . . , αn) ∈ N . α is called a multi-index.

Definition 1.3.2 (Polynomial). Given a field F, a polynomial is a finite linear combination of monomials with coefficients in F, i.e. if f is a polynomial over F it can be written X α f = cαx α∈A for some finite set A and cα ∈ F. A homogenuos polynomial is a polynomial where all the multi-indices α ∈ A sum to the same integer. In other words, all the monomials have the same degree.

The set F[x1, x2, . . . , xn] of all polynomials over the ﬁeld F in variables x1, x2, . . . , xn forms a commutative ring. Since it will be important in the sequel, we remind of some important deﬁnitions and results in ring theory.

Deﬁnition 1.3.3 (Ideal). If R is a commutative ring (e.g. F[x1, x2, . . . , xn]), an ideal in R is a set I for which the following holds:

• If x, y ∈ I, we have x + y ∈ I (I is a subgroup of (R, +).)

• If x ∈ I and r ∈ R we have rx ∈ I.

If f1, f2, . . . , fk ∈ R, the ideal generated by f1, f2, . . . , fk, denoted hf1, f2, . . . , fki, is deﬁned as: ( k ) X hf1, f2, . . . , fki = qifi | qi ∈ R . i=1

The next theorem is a special case of Hilbert’s basis theorem.

Theorem 1.3.4. Every ideal in the polynomial ring F[x1, x2, . . . , xn] is ﬁnitely generated, i.e. for every ideal I there exists polynomials f1, f2, . . . , fk such that I = hf1, f2, . . . , fki. 10 Chapter 1. Introduction

1.3.2 Varieties and ideals Definition 1.3.5 (Affine algebraic set). An affine algebraic set is the set X ⊂ Fn of solutions to a system of polynomial equations

f1 = 0

f2 = 0 . .

fk = 0 for a given set {f1, f2, . . . , fk} of polynomials in n variables. We write X = V(f1, f2, . . . fk) for this aﬃne algebraic set. An algebraic set X is called irreducible, or a variety if it cannot be written as X = X1 ∪ X2 for algebraic sets X1,X2 ⊂ X.

Definition 1.3.6 (Ideal of an affine algebraic set). For an algebraic set X ⊂ Fn, the ideal of X, denoted I(X) is the set of polynomials f ∈ F[x1, x2, . . . , xn] such that f(a1, a2, . . . , an) = 0 for every (a1, a2, . . . , an) ∈ X. When one works with algebraic sets one wants to find equations for the set and this can mean different things. A set of polynomials P = {p1, p2, . . . , pk} is said to cut out the algebraic set X set-theoretically if the set of common zeros of p1, p2, . . . , pk is X. P is said to cut out X ideal-theoretically if P is a generating set for I(X).

Example 1.3.7 (Twisted cubic). The twisted cubic is a curve in R3 which can be given as the image of R under the mapping t 7→ (t, t2, t3), ﬁg. 1.1. However, the twisted cubic can also be viewed as an algebraic set, namely V(y−x2, z−x3), ﬁg. 1.2.

Figure 1.1: The image of t 7→ (t, t2, t3) for −1 ≤ t ≤ 1. 1.3. Algebraic geometry 11

Figure 1.2: The intersection of the surfaces deﬁned by y−x2 = 0 and z−x3 = 0, namely the twisted cubic, for (−1, 0, −1) ≤ (x, y, z) ≤ (1, 1, 1).

Example 1.3.8 (Matrices of rank r). Given vector spaces V,W of dimensions n m ∗ n and m and bases {vi}i=1 and {wj}j=1 respectively, V ⊗ W can be identified with the set of m×n matrices. The set of matrices of rank at most r is a variety in this space, namely the variety defined as the zero set of all (r + 1) × (r + 1) minors, since a matrix has rank less than or equal to r if and only if all of its (r + 1) × (r + 1) minors are zero. For example, if n = 4 and m = 3, a matrix defining a map between V and W can be written   x11 x12 x13 x14 x21 x22 x23 x24 x31 x32 x33 x34 and the variety of matrices of rank 2 or less is the matrices satisfying

x11 x12 x13 x11 x12 x14

x21 x22 x23 = 0 x21 x22 x24 = 0

x31 x32 x33 x31 x32 x34

x11 x13 x14 x12 x13 x14

x21 x23 x24 = 0 x22 x23 x24 = 0.

x31 x33 x34 x32 x33 x34 That these equations cut out the set of 4 × 3 matrices of rank 2 or less set- theoreotically is easy to prove. They also generate the ideal for the variety, but this is harder to prove.

1.3.3 Projective spaces and varieties Definition 1.3.9 (Projective space). The n-dimensional projective space over F, denoted Pn(F), is the set Fn+1 \{0} modulo the equivalence relation ∼ where x ∼ y if and only if x = λy for some λ ∈ F \{0}. For a vector space V we write PV for the projectivization of V , and if v ∈ V , we write [v] for the equivalence 12 Chapter 1. Introduction class to which v belongs, i.e. [v] is the element in PV corresponding to the line λv in V . For a subset X ⊆ PV we will write Xˆ for the affine cone of X in V , i.e. Xˆ = {v ∈ V :[v] ∈ X}. We will now define what is meant by a projective algebraic set. Note that the zero locus of a polynomial is not defined in projective space, since in general f(x) 6= f(λx) for a polynomial f, but x = λx in projective space. However, for a polynomial F which is homogeneous of degree d the zero locus is well defined, since F (λx) = λdF (x). Note that even though the zero locus of a homogeneous polynomial is well defined on projective space, the homogeneous polynomials are not functions on projective space. Definition 1.3.10 (Projective algebraic set). A projetive algebraic set X ⊂ Pn(F) is the solution set to a system of polynomial equations

F1(x) = 0

F2(x) = 0 . .

Fk(x) = 0 for a set {F1,F2,...,Fk} of homogeneous polynomials in n + 1 variables. A projective algebraic set is called irreducible or a projective variety if it is not the union of two projective algebraic sets. Deﬁnition 1.3.11 (Ideal of a projective algebraic set). If X ⊂ Pn(F) is an algebraic set, its ideal I(X) is the set of all homogeneous polynomials which vanish on X, i.e. I(X) consists of all polynomials F such that

F (a1, a2, . . . , an+1) = 0 for all (a1, a2, . . . , an+1) ∈ X. Deﬁnition 1.3.12 (Zariski topology). The Zariski topology on Pn(F) (or Fn) is deﬁned by its closed sets, which are taken to be all the sets X for which there exists a set S of homogeneous polynomials (or arbritrary polynomials in the case of Fn) such that X = {α : f(α) = 0 ∀f ∈ S}. The Zariski closure of a set X is the set V(I(X)).

1.3.4 Dimension of an algebraic set Deﬁnition 1.3.13 (Tangent space). Let M be a subset of a vector space V ˆ over F = R or C and let x ∈ M. The tangent space TxM ⊂ V is the span of vectors which are derivatives α0(0) of a smooth curve α : F → M such that α(0) = x. For a projective algebraic set X ⊂ PV , the aﬃne tangent space to X at ˆ ˆ ˆ [x] ∈ X is T[x]X := TxX.

Deﬁnition 1.3.14 (Smooth and singular points). If dim TˆxX is constant at and near x, x is called a smooth point of X. If x is not smooth, it is called a singular point. For a variety X, let Xsmooth and Xsing denote the smooth and singular points of X respectively. 1.3. Algebraic geometry 13

Definition 1.3.15 (Dimension of a variety). For an affine algebraic set X, define the dimension of X as dim(X) := dim(TˆxX) for x ∈ Xsmooth. For an projective algebraic set X, define the dimension of X as dim(X) := dim(TˆxX) − 1 for x ∈ Xsmooth. Example 1.3.16 (Cuspidal cubic). The variety X in R2 given by X = V(y2 − x3) is called the cuspidal cubic, see fig. 1.3. The cuspidal cubic has one singular point, namely (0, 0). One can see that both the unit vector in the x-direction and the unit vector in the y-direction are tangent vectors to the variety at the ˆ point (0, 0). Thus dim T(0,0)X = 2, but for all x 6= (0, 0) on the cuspidal cubic we have dim TˆxX = 1, so (0, 0) is a singular point but all other points are smooth and the dimension of the cuspidal cubic is one.

Figure 1.3: The cuspidal cubic.

Example 1.3.17 (Matrices of rank r). Going back to the example of the matrices of size m × n with rank r or less, these can also be seen as a projective variety. We form the projective space Pm×n−1(F) (i.e. the space of matrices where matrices A and B are identified iff A = λB for some λ 6= 0, note that if A and B are identified they have the same rank). The equations will still be the same; the minors of size (r + 1) × (r + 1), which are homogeneous of degree r + 1. Example 1.3.18 (Segre variety). This variety will be very important in the sequel. Let V1,V2,... be complex vector spaces. The two-factor Segre variety is the variety defined as the image of the map

Seg : PV1 × PV2 → P(V1 ⊗ V2) Seg([v1], [v2]) = [v1 ⊗ v2] and it can be seen that the image of this map is the projectivization of the set of rank one tensors in V1 ⊗ V2. We can in a similar fashion deﬁne the n-factor Segre as the image of

Seg : PV1 × · · · × PVn → P(V1 ⊗ · · · ⊗ Vn) Seg([v1],..., [vn]) = [v1 ⊗ · · · ⊗ vn] 14 Chapter 1. Introduction and the image is once again the projectivization of the set of rank one tensors in V1 ⊗ · · · ⊗ Vn. That the 2-factor Segre variety is an algebraic set follows from the fact that the 2 × 2 minors furnish equations for the variety. In the next chapter we will work with the 3-factor Segre variety, for which equations are provided in section 2.3.1. For a general proof for the n-factor Segre, see [25, page 103]. Any curve in Seg(PV1 × PV2) is of the form v1(t) ⊗ v2(t), and its derivative 0 0 will be v1(0) ⊗ v2(0) + v1(0) ⊗ v2(0). Thus ˆ T[v1⊗v2] Seg(PV1 × PV2) = V1 ⊗ v2 + v1 ⊗ V2 and the intersection between V1 ⊗ v2 and v1 ⊗ V2 is the one-dimensional space spanned by v1 ⊗ v2. Therefore the dimension of the Segre variety is n1 + n2 − 2, where n1, n2 are the dimensions of V1,V2 respectively.

1.3.5 Cones, joins, and secant varieties

Deﬁnition 1.3.19 (Cone). Let X ⊂ Pn(F) be a projective variety and p ∈ Pn(F) a point. The cone over X with vertex p, J(p, X), is the Zariski closure of the union of all the lines pq joining p with a point q ∈ X, i.e.: [ J(p, X) = pq. q∈X

n Deﬁnition 1.3.20 (Join of varieties). Let X1,X2 ⊂ P (F) be two varieties. The join of X1 and X2 is the set [ J(X1,X2) = p1p2

p1∈X1,p2∈X2,p16=p2 which can be interpreted as the Zariski closure of the union of all cones over X2 with a vertex in X1, or vice versa. The join of several varieties X1,X2,...,Xk is deﬁned inductively:

J(X1,X2,...,Xk) = J(X1,J(X2,...,Xk)).

Deﬁnition 1.3.21 (Secant variety). Let X be a variety. The r:th secant variety of X is the set σr(X) = J(X,...,X). | {z } k copies Lemma 1.3.22 (Secant varieties are varieties). Secant varieties of irreducible algebraic sets are irreducible, i.e. they are varieties. Proof. See [17, p. 144, prop. 11.24].

Let X ⊂ Pn(F) be an algebraic set of dimension k. The expected dimension of σr(X) is min{rk + r − 1, n}. However, the dimension is not always the expected.

Deﬁnition 1.3.23 (Degenerate secant variety). Let X ⊂ Pn(F) be an projective variety with dim(X) = k. If dim σr(X) < min{rk + r − 1, n}, then σr(X) is called degenerate with defect δr(X) = rk + r − 1 − dim σr(X). 1.3. Algebraic geometry 15

Deﬁnition 1.3.24 (X-rank). If V is a vector space over C, X ⊂ PV is a projective variety and p ∈ PV is a point, the X-rank of p is the smallest number r of points in X such that p lies in their linear span. The X-border rank of p is the least number r such that p lies in the σr(X), the r:th secant variety of X. The generic X-rank is the smallest r such that σr(X) = PV . These notions of X-rank and X-border rank will coincide with the ideas of tensor rank and tensor border rank (see section 2.1) when X is taken to be the Segre variety.

Lemma 1.3.25 (Terracini’s lemma). Let xi for i = 1, . . . , r be general points ˆ of Xi, where Xi are projective varieties in PV for a complex vector space V and let [u] = [x1 + ··· + xr] ∈ J(X1,...,Xr). Then ˆ ˆ ˆ T[u]J(X1, ··· ,Xr) = T[x1]X1 + ··· + T[xr ]Xr.

Proof. It is enough to consider the case of u = x1 + x2 for x1 ∈ X1, x2 ∈ X2 ˆ for varieties X1,X2 ∈ PV and deriving the expression for T[u]J(X1,X2). The addition map a : V × V → V is deﬁned by a(v1, v2) = v1 + v2. Then

Jˆ(X1,X2) = a(Xˆ1 × Xˆ2) ˆ and so, for general points x1, x2, T[u]J(X1,X2) is obtained by diﬀerentiating curves x1(t) ∈ X1, x2(t) ∈ X2 with x1(0) = x1, x2(0) = x2. Thus the tangent space to x1 + x2 in J(X1,X2) will be the sum of tangent spaces of x1 in X1 and x2 in X2.

1.3.6 Real algebraic geometry In section 2.4 we will need the following definition. Definition 1.3.26 (Affine semi-algebraic set). An affine semi-algebraic set is a subset of Rn of the form: s r [ \i {x ∈ R | fi,ji,j0} i=1 j=1 where fi,j ∈ R[x1, . . . , xn] and i,j is < or =. Example 1.3.27 (Semi-algebraic set). Consider the semi-algebraic set given by 2 2 f1,1 = x + y − 2 3 f = x − y 1,2 2 f1,3 = −y 2 2 f2,1 = x + y − 2 3 f = x + y 2,2 2 f2,3 = y 1 f = (x − 2)2 + y2 − 3,1 4 1 f = (x − 7/2)2 + y2 − 4,1 4 16 Chapter 1. Introduction and all i,j being <. The set can be vizualised as in figure 1.4.

Figure 1.4: An example of a semi-algebraic set.

1.4 Application to matrix multiplication

We take a look at the problem of eﬃcient computation of the product of 2 × 2 matrices. Let A, B, C be copies of the space of n × n matrices, and let the multiplication mapping mn : A × B → C given by mn(M1,M2) = M1M2. To compute the matrix M3 = m2(M1,M2) = M1M2 one can naively use eight multiplications and four additions using the standard method for matrix multiplication. Explicitly, if

1 1 1 1 a1 a2 b1 b2 M1 = 2 2 M2 = 2 2 a1 a2 b1 b2 one can compute M3 = M1M2 by

1 1 1 1 2 c1 = a1b1 + a2b1 1 1 1 1 2 c2 = a1b2 + a2b2 2 2 1 2 2 c1 = a1b1 + a2b1 2 2 1 2 2 c2 = a1b2 + a2b2.

However, this is not optimal. Strassen [37] showed that one can calculate M3 = M1M2 using only seven multiplications. First, one calculates

1 2 1 2 k1 = (a1 + a2)(b1 + b2) 2 2 1 k2 = (a1 + a2)b1 1 1 2 k3 = a1(b2 − b2) 2 1 2 k4 = a2(−b1 + b1) 1 1 2 k5 = (a1 + a2)b2 1 2 1 1 k6 = (−a1 + a1)(b1 + b2) 1 2 2 2 k7 = (a2 − a2)(b1 + b2) 1.4. Application to matrix multiplication 17

and the coeﬃents of M3 = M1M2 can then be calculated as

1 c1 = k1 + k4 − k5 + k7 2 c1 = k2 + k4 1 c2 = k3 + k5 2 c2 = k1 + k3 − k2 + k6.

Now, the map mn : A × B → C is obviously a bilinear map and as such can be expressed as a tensor. Let us take a look at m2. Equip A, B, C with the same basis 1 0 0 1 0 0 0 0 . 0 0 0 0 1 0 0 1

j 2,2 j 2,2 For clarity, let m2 : A × B → C and let the bases be {ai }i=1,j=1, {bi }i=1,j=1, j 2,2 j 2,2 j 2,2 {ci }i=1,j=1. Let the dual bases of A, B be {αi }i=1,j=1, {βi }i=1,j=1 respectively. ∗ ∗ Thus, m2 ∈ A ⊗ B ⊗ C and the standard algorithm for matrix multplication corresponds to the following rank eight decomposition of m2:

1 1 1 2 1 1 1 1 2 1 m2 = (α1 ⊗ β1 + α2 ⊗ β1 ) ⊗ c1 + (α1 ⊗ β2 + α2 ⊗ β2 ) ⊗ c2 2 1 2 2 2 2 1 2 2 2 + (α1 ⊗ β1 + α2 ⊗ β1 ) ⊗ c1 + (α1 ⊗ β2 + α2 ⊗ β2 ) ⊗ c2 whereas Strassen’s algorithm corresponds to a rank seven decomposition of m2:

1 2 1 2 1 2 2 2 1 2 2 m2 = (α1 + α2) ⊗ (β1 + β2 ) ⊗ (c1 + c2) + (α1 + α2) ⊗ β1 ⊗ (c1 − c2) 1 1 2 1 2 2 1 2 1 2 + α1 ⊗ (β2 − β2 ) ⊗ (c2 + c2) + α2 ⊗ (−β1 + β1 ) ⊗ (c1 + c1) 1 1 2 1 1 1 2 1 1 2 + (α1 + α2) ⊗ β2 ⊗ (−c1 + c2) + (−α1 + α1) ⊗ (β1 + β2 ) ⊗ c2 1 2 2 2 1 + (α2 − α2) ⊗ (β1 + β2 ) ⊗ c1.

It has been proven that both the rank and border rank of m2 is seven [26]. This can be seen from the fact that σ7(Seg(PA × PB × PC)) = P(A ⊗ B ⊗ C). However, the rank of mn for n ≥ 3 is still unkown. Even for m3, all that is known is that the rank is between 19 and 23 [25, chapter 11]. It is interesting to note that this is lower than the generic rank for 9 × 9 × 9 tensors, which is 30 (theorem 2.3.8). The rank of m2 is however the generic seven. 18 Chapter 1. Introduction Chapter 2

Tensor rank

In this chapter we present some results on tensor rank, mainly from the view of algebraic geometry. We introduce diﬀerent types of rank of a tensor and show some basic results concerning these diﬀerent types of ranks. We derive equations for the Segre variety and show some basic results on secant defects of the Segre variety and generic ranks. A general reference for this chapter is [25].

2.1 Diﬀerent notions of rank

If T : U → V is a linear operator and U, V are vector spaces, the rank of T is the dimension of the image T (U). If one considers T as an element of U ∗ ⊗ V , the rank of T coincides with the smallest integer R such that T can be written

R X T = αi ⊗ vi. i=1

However, if one considers a T ∈ V1 ⊗ V2 ⊗ · · · ⊗ Vk, this can be viewed as a ∗ linear operator Vi → V1 ⊗ · · · ⊗ Vi−1 ⊗ Vi+1 ⊗ · · · ⊗ Vk for any 1 ≤ i ≤ k, so T can be viewed as a linear operator in these k diﬀerent ways, and for every way ∗ ∗ we get a diﬀerent rank. The k-tuple (dim T (V1 ),..., dim T (Vk )) is known as the multilinear rank of T . However, the smallest integer R such that T can be written R X (1) (k) T = vi ⊗ · · · ⊗ vi i=1 is known as the rank of T (sometimes called the outer product rank). If T is a tensor, let R(T ) denote the rank of T . The idea of tensor rank gets more complicated still. If a tensor T has rank R it is possible that there exist tensors of rank R˜ < R such that T is the limit of these tensors, in which case T is said to have border rank R˜. Let R(T ) denote the border rank of the tensor T .

Erdtman, J¨onsson,2012. 19 20 Chapter 2. Tensor rank

Example 2.1.1 (Border rank). Consider the numerically given tensor T 0 1 1 1 1 1 T = ⊗ ⊗ + ⊗ ⊗ 1 0 1 2 0 1 0 2 1 0 1 −1 1 0 1 0 + ⊗ ⊗ + ⊗ ⊗ = . 1 1 1 1 0 1 4 1 6 1 One can show that T has rank 3, for instance with a method for p×p×2 tensors used in [36]. Now consider the rank-two tensor T (ε) ε − 1 0 1 1 T (ε) = ⊗ ⊗ ε 1 0 1 1 0 1 1 2 1 −1 + + ε ⊗ + ε ⊗ + ε . ε 1 2 0 1 1 1 Calculating T (ε) for a few values of ε gives us the following results 0 0 6 2 T (1) = , 0 0 18 6 1.0800 0.0900 1.0800 0.1100 T 10−1 = , 3.9600 1.0800 6.8400 1.3200 1.0010 1.0010 1.0030 0.0010 T 10−3 = , 4.0000 1.0010 6.0080 1.0030 1.0000 0.0000 1.0000 0.0000 T 10−5 = , 4.0000 1.0000 6.0001 1.0000 which gives us an indication that T (ε) → T when ε → 0. The above tensor is a special case of tensors on the form

T = a1 ⊗ b1 ⊗ c1 + a2 ⊗ b1 ⊗ c1 + a1 ⊗ b2 ⊗ c1 + a1 ⊗ b1 ⊗ c2 and even in this general case one can show that T has rank three, but there are tensors of rank two arbitrarly close to it: 1 T (ε) = ((ε − 1)a ⊗ b ⊗ c + (a + εa ) ⊗ (b + εb ) ⊗ (c + εc )) = ε 1 1 1 1 2 1 2 1 2 1 = (εa ⊗ b ⊗ c − a ⊗ b ⊗ c + a ⊗ b ⊗ c + εa ⊗ b ⊗ c + ε 1 1 1 1 1 1 1 1 1 2 1 1 2 + εa1 ⊗ b2 ⊗ c1 + εa1 ⊗ b1 ⊗ c2 + O(ε )) → T , when ε → 0. There is a well-known result for matrices which states that if one fills an n×m matrix with random entries, the matrix will have maximal rank, min{n, m}, with probability one. In the case of square matrices, a random matrix will be invertible with probability one. For tensors over C the situation is similar; a random tensor will have a certain rank with probability one - this rank is called the generic rank. Over R however, there can be multiple ranks, called typical ranks, which a random tensor takes with non-zero probability, see more in section 2.4. For now, we remind of definition 1.3.24, and that the generic rank is the smallest r such that the r:th secant variety of the Segre variety is the whole space. Compare these observations and definitions with the fact that GL(n, C) is a n2-dimensional manifold in the n2-dimensional space of n × n matrices, and a random matrix in this space is invertible with probability one. 2.1. Different notions of rank 21

2.1.1 Results on tensor rank Theorem 2.1.2. Given an I × J × K tensor T , R(T ) is the minimal number p of rank one J × K matrices S1,...,Sp such that Ti ∈ span(S1,...,Sp) for all slices Ti of T . Proof. For a tensor T one can write

R(T ) X T = ak ⊗ bk ⊗ ck k=1

1 I T and thus, if ak = (ak, . . . , ak) , we have

R(T ) X i Ti = akbk ⊗ ck k=1 so Ti ∈ span(b1 ⊗c1,..., bR(T ) ⊗cR(T )) for i = 1,...,I, which proves R(T ) ≥ p. If Ti ∈ span(S1,...,Sp) with rank(Sj) = 1 for i = 1,...,I, we can write

p p X i X i Ti = xkSk = xkyk ⊗ zk k=1 k=1

1 I and thus with xk = (xk, . . . , xk) we get

p X T = xk ⊗ yk ⊗ zk k=1 which proves R(T ) ≤ p, resulting in R(T ) = p. Corollary 2.1.3. For an I × J × K-tensor T , R(T ) ≤ min{IJ, IK, JK}. Proof. One observes from theorem 2.1.2 that one can manipulate any of the three kinds of slices in T , and thus one can pick the kind which results in the smallest matrices, say m × n. The space of m × n matrices is spanned by the m,n mn matrices Mkl = {δkl(i, j)}i,j=1. Thus one cannot need more than mn rank one matrices to get all the slices in the linear span.

2.1.2 Symmetric tensor rank Deﬁnition 2.1.4 (Symmetric rank). Given a tensor T ∈ SdV , the symmetric rank of T , denoted RS(T ), is deﬁned as the smallest R such that

R X T = vr ⊗ · · · ⊗ vr, r=1 for vi ∈ V . The symmetric border rank of T is deﬁned as the smallest R such that T is the limit of symmetric tensors of symmetric rank R.

Since we, over R and C, can put symmetric tensors of order d in bijective correspondence with homogeneous polynomials of degree d, and vectors in bijective correspondence with linear forms, the symmetric rank of a given symmetric 22 Chapter 2. Tensor rank tensor can be translated to the number R of linear forms needed for a given homogeneous polynomial of degree d to be expressed as a sum of linear forms to the power of d. That is, if P is a homogeneous polynomial of degree d over C, what is the least R such that

d d P = l1 + ··· + lR for linear forms li? Over C, the following theorem gives an answer to this question in the generic case.

Theorem 2.1.5 (Alexander-Hirschowitz). The generic symmetric rank in SdCn is &n+d−1' d (2.1) n except for d = 2, where the generic symmetric rank is n and for (d, n) ∈ {(3, 5), (4, 3), (4, 4), (4, 5)} where the generic symmetric rank is (2.1) plus one.

Proof. A proof can be found in [7]. An overview and introduction to the proof can be found in [25, chapter 15].

During the American Institute of Mathematics (AIM) workshop in Palo Alto, USA, 2008 (see [33]) P. Comon stated the following conjecture:

Conjecture 2.1.6. For a symmetric tensor T ∈ SdCn, its symmetric rank RS(T ) and tensor rank R(T ) are equal. This far the conjecture has been proved true for R(T ) = 1, 2, R(T ) ≤ n and for suﬃciently large d with respect to n [10], and for tensors of border rank two [3]. Furthermore during the AIM workshop D. Gross showed that the conjecture is also true when R(T ) ≤ R(Tk,d−k) if k < d/2, here Tk,d−k is a way to view T k d−k as a second order tensor, i.e., Tk,d−k ∈ S V ⊗ S V .

2.1.3 Kruskal rank The Kruskal rank is named after Joseph B. Kruskal and is also called k-rank. For a matrix A the k-rank is the largest number, κA, such that any κA columns PR of A are linearly independent. Let T = r=1 ar ⊗ br ⊗ cr and let A, B and C denote the matrices with a1,..., aR, b1,..., bR and c1,..., cR as column vectors, respectively. Then the k-rank of T is the tuple (κA, κB, κC ) of the k-ranks of the matrices A, B, C. With the k-rank of T Kruskal showed that the condition

κA + κB + κC ≥ 2R(T ) + 2 is suﬃcient for T to have a unique, up to trivialities, CP-decomposition ([22]). This result has been generalized in [35] to order d tensors as

d X κAi ≥ 2R(T ) + d − 1, (2.2) i=1

where Ai is the matrix corresponding to the i:th vector space with k-rank κAi . 2.2. Varieties of matrices over C 23

2.1.4 Multilinear rank A reason why the multilinear rank is of interest for the tensor rank is that it can be used to set up lower bounds for the tensor rank. To find some lower bound we recall the definition of multilinear rank of a ∗ ∗ ∗ tensor T ; the k-tuple (dim T (V1 ),..., dim T (Vk )). Here T (Vi ) is the image of ∗ the linear operator Vi → V1 ⊗ · · · ⊗ Vi−1 ⊗ Vi+1 ⊗ · · · ⊗ Vk. From linear algebra we know that the rank of a linear operator is at most the same dimension as the domain, dim(Vi), or at most as the dimension of the codomain, which is Qk ∗ j=1,j6=i dim(Vj). Since elements of T (Vi ) can be seen as elements of V1 ⊗...⊗ ∗ Vk with elements of Vi fixed, dim T (Vi ) will be at most R(T ). Therefore k ∗ Y dim(T (Vi )) ≤ min{R(T ), dim(Vi), dim(Vj)}, j=1,j6=i which can be interpreted as ∗ ∗ R(T ) ≥ max{dim(T (V1 )),..., dim(T (Vk ))}.

2.2 Varieties of matrices over C As a warm-up for what is to come, we will consider the two-factor Segre variety (example 1.3.18) and its secant varieties. The Segre variety Seg(PV × PW ) corresponds to matrices (of a given size) of rank one and the secant variety (definition 1.3.21) σr(Seg(PV × PW )) to matrices of rank ≤ r. Let V and W be vector spaces of dimension nV and nW respectively. Thus the space V ⊗W has dimension nV nW and PV, PW, P(V ⊗W ) have dimensions nV − 1, nW − 1, nV nW − 1. The Segre map Seg : PV × PW → P(V ⊗ W ) embeds the whole space PV × PW in P(V ⊗ W ). Thus the two-factor Segre variety, which can be interpreted as the projectivization of the set of matrices with rank one, has dimension nV + nW − 2. As noted in chapter 1, the expected dimension of the secant variety σr(X) where dim(X) = k is min{rk +r −1, n} where n is the dimension of the ambient 3 3 space. Thus, the expected dimension of σ2(Seg(P C ⊗ P C)) is min{2 · 4 + 2 − 1, 8) = 8, so if the dimension would have been the expected, the rank two matrices would have filled out the whole space. However, we know that this is 3 not true, since there are 3 × 3 matrices of rank three. Therefore σ2(Seg(P C ⊗ P3C)) must be degenerate. We want to find the defects of the secant varieties σr(PV × PW ). From the definition of dimension of a variety (definition 1.3.15) it is enough to consider the dimension of the affine tangent space to a smooth point in σr(PV × PW ). Choose bases for V and W and consider the column vectors  1  xi  .  xi =  .  nV xi for i = 1, . . . , r. Construct the matrix  

Pr r+1 Pr nW M = x1 x2 ··· xr i=1 ci xi ··· i=1 ci xi 24 Chapter 2. Tensor rank

l so rank(M) = r. The xi and the ci are parameters, which gives a total of rnV + r(nW − r) = r(nV + nW − r) parameters. Thus dim σr(Seg(PV × PW )) = r(nV + nW − r) − 1 and the defect is

2 δr = r(nV + nW − 2) + r − 1 − r(nV + nW − r) + 1 = r − r.

One can see that inserting r = nV or r = nW in the expression for dim σr(Seg(PV × PW )) yields nV nW − 1 = dim P(V ⊗ W ), the conclusion being the well-known result that the maximal rank for linear maps V → W is min{nV , nW }. However, there is another way to arrive at the formula for the dimension of dim σr(PV ×PW ). For this we use lemma 1.3.25. So, let [p] ∈ σr(Seg(PV ×PW )) be a general point, we can take p = v1 ⊗ w1 + ··· + vr ⊗ wr, where the vi and wi are linearly independent. Thus, by Terracini’s lemma:

Tˆpσr = V ⊗ span(w1,..., wr) + span(v1,..., vr) ⊗ W

2 the two spaces have r -dimensional intersection span(v1,..., vr)⊗span(w1,..., wr), 2 so the tangent space has dimension rnV +rnW −r and dim σr(Seg(PV ×PW )) = r(nV + nW − r) − 1.

2.3 Varieties of tensors over C In this section we will provide equations for the three-factor Segre variety and show some basic results on generic ranks. Note that it is not possible to have an algebraic set which contains only the tensors of rank R or less, it must contain the tensors of border rank R or less: Assume that p is a polynomial such that every tensor of rank R or less is a zero of p and that T is a tensor with border rank R (or less) but with a rank greater than R. Now, let Ti be a sequence of tensors of rank R (or less) such that Ti → T . Since polynomials are continuous we get p(T ) = limi→∞ p(Ti) = 0. One can also note that there is only one tensor of rank zero, namely the zero tensor.

2.3.1 Equations for the variety of tensors of rank one The easiest case (not counting the case of rank zero tensors), is the tensors of rank one, which are rather well-behaved.

Lemma 2.3.1. Let T be a third order tensor. Assuming T is not the zero tensor, T has rank one if and only if the ﬁrst non-zero slice has rank one and all the other slices are multiples of the ﬁrst.

Proof. Special case of theorem 2.1.2.

Theorem 2.3.2. An I × J × K tensor T with elements xi,j,k has rank less than or equal to one if and only if

xi1,j1,k1 xi2,j2,k2 − xl1,m1,n1 xl2,m2,n2 = 0 (2.3) for all i1, i2, j1, j2, k1, k2, l1, l2, m1, m2, n1, n2 where {i1, i2} = {l1, l2}, {j1, j2} = {m1, m2}, {k1, k2} = {n1, n2}. 2.3. Varieties of tensors over C 25

Proof. Assume T has rank one, i.e. T = a ⊗ b ⊗ c. Then xi,j,k = aibjck which makes (2.3) true. Conversely, assume (2.3) is satisﬁed. Fixing i1 = i2 = 1 one gets the 2 × 2 minors for the ﬁrst slice T1 of T , which implies that T1 has rank (at most) one. Assume without loss of generality that T is not the zero tensor and that T1 is non-zero, and especially x111 6= 0. Take i1 = j1 = k1 = 1 in (2.3) and one gets

xk,1,1 x1,1,1xk,i,j = x1,i,jxk,1,1 ⇐⇒ xk,i,j = x1,i,j x1,1,1 | {z } :=αk and since αk is only dependent on which slice one picked, k, this shows that all slices are multiples of the ﬁrst slice. By lemma 2.3.1 this is equivalent to T having rank one. In other words, (2.3) cuts out the three factor-Segre variety set-theoretically. Theorem 2.3.3. A tensor has border rank one if and only if it has rank one. Proof. The Segre variety consists of the projectivization of all tensors of rank one. Since the Segre variety is an algebraic set, there exists an ideal P of polynomials such that the Segre variety is V(P ). If (a projectivization of) a tensor has border rank one, it too has to be a zero of P and is therefore an element in the Segre variety, and thus has rank one.

2.3.2 Varieties of higher ranks

Let X = Seg(PA × PB × PC)) where A, B, C are complex vector spaces, so X is the projective variety of tensors of rank one. We can now form varieties of tensors of higher border ranks by forming secant varieties. The secant variety σr(X) will be all tensors of border rank r or less. By theorem 1.3.22 the secant varieties will be irreducible since X is irreducible. Consider the r:th secant variety of the Segre variety, σr(Seg(PA × PB × PC)), where dim A = nA, dim B = nB, dim C = nC and assume that r ≤ min{nA, nB, nC }. A general point [p] in the secant variety can then be written

[p] = [a1 ⊗ b1 ⊗ c1 + a2 ⊗ b2 ⊗ c2 + ··· + ar ⊗ br ⊗ cr] and by Terracini’s lemma (lemma 1.3.25), with X = Seg(PA × PB × PC): ˆ ˆ ˆ T[p]σr(X) = T[a1⊗b1⊗c1]X + ··· + T[ar ⊗br ⊗cr ]X =

= a1 ⊗ b1 ⊗ C + a1 ⊗ B ⊗ c1 + A ⊗ b1 ⊗ c1

+ ··· + ar ⊗ br ⊗ C + ar ⊗ B ⊗ cr + A ⊗ br ⊗ cr.

The spaces ai ⊗ bi ⊗ C, ai ⊗ B ⊗ ci,A ⊗ bi ⊗ ci share the one-dimensional space spanned by ai ⊗ bi ⊗ ci and thus ai ⊗ bi ⊗ C + ai ⊗ B ⊗ ci + A ⊗ bi ⊗ ci has dimension nA + nB + nC − 2. It follows that dim σr(Seg(PA × PB × PC)) = r(nA + nB + nC − 2) − 1 which is the expected dimension. We have proved the following:

Theorem 2.3.4. The secant variety σr(Seg(PA × PB × PC)) has the expected dimension for r ≤ min{dim A, dim B, dim C}. 26 Chapter 2. Tensor rank

Corollary 2.3.5. The generic rank for tensors in C2 ⊗ C2 ⊗ C2 is 2.

Theorem 2.3.6. The generic rank in A ⊗ B ⊗ C is greater than or equal to n n n A B C . nA + nB + nC − 2

Proof. Let X = Seg(PA × PB × PC) so dim X = nA + nB + nC − 2. The expected dimension of σr(Seg(PA × PB × PC)) is r(nA + nB + nC − 2) − 1. If r is the generic rank the dimension of the secant variety is nAnBnC − 1. Thus, r(nA + nB + nC − 2) − 1 ≥ nAnBnC − 1 which implies n n n r ≥ A B C . nA + nB + nC − 2

From theorem 2.3.6 we see that the generic rank for a tensor in Cn ⊗Cn ⊗C2 is at least n. We can also see that if σn(X) is not degenerate, n is the generic rank. This is actually the case.

Theorem 2.3.7 (Generic rank of quadratic two slice-tensors.). The generic rank in Cn ⊗ Cn ⊗ C2 is n.

Proof. With the same notation as the rest of this section, we show that σn(X) ⊂ n n 2 C ⊗ C ⊗ C . A general point [p] ∈ σn(X) is given by

" n # X [p] = ai ⊗ bi ⊗ ci i=1

n n n 2 where {ai}i=1, {bi}i=1 are bases for C and ci ∈ C . By Terracini’s lemma:

n ˆ X n n 2 T[p]σn(X) = C ⊗ bi ⊗ ci + ai ⊗ C ⊗ ci + ai ⊗ bi ⊗ C i=1

n n 2 where the spaces C ⊗ bi ⊗ ci, ai ⊗ C ⊗ ci, ai ⊗ bi ⊗ C have a one-dimensional 2 intersection ai ⊗ bi ⊗ ci, so the dimension is n(n + n + 2 − 2) − 1 = 2n − 1 = dim P(Cn ⊗ Cn ⊗ C2), so the secant variety is not degenerate and the generic rank is n.

Theorem 2.3.8 (Generic rank of cubic tensors.). The generic rank in Cn ⊗ Cn ⊗ Cn for n 6= 3 is n3 . 3n − 2

In C3 ⊗ C3 ⊗ C3 the generic rank is 5.

Proof. See [29]. 2.4. Real tensors 27

2.4 Real tensors

The case of tensors over R is more complicated than the case of tensors over C. For example, there is not necessarily one single generic rank, but there can be several ranks for which the tensors with these ranks have positive measure, for any measure compatible with the Euclidean structure of the space, e.g. the Lebesgue measure. Such ranks are called typical ranks. For instance, a randomly picked tensor in R2 ⊗ R2 ⊗ R2, the elements taken from a standard distribution, π π has rank two with probability 4 and rank three with probability 1 − 4 [4, 5]. We state a theorem describing the situation. First, deﬁne the mapping

n1 n2 n3 k n1 n2 n3 fk :(C × C × C ) → C ⊗ C ⊗ C

k X fk(a1, b1, c1,..., ak, bk, ck) = ar ⊗ br ⊗ cr r=1 thus, f1 is the Segre mapping.

Theorem 2.4.1. The space Rn1 ⊗ Rn2 ⊗ Rn3 contains a ﬁnite number of open connected semi-algebraic sets O1,...,Om satisfying:

n1 n2 n3 Sm 1. R ⊗ R ⊗ R \ i=1 Oi is a closed semi-algebraic set whose dimension is strictly smaller than n1n2n3.

2. For i = 1, . . . , m there is an ri such that ∀T ∈ Oi, the rank of T is ri.

n1 n2 n3 3. The minimum rmin of all the ri is the generic rank in C ⊗ C ⊗ C .

4. The maximum rmax of all the ri is the minimal k such that the closure of n1 n2 n3 k n1 n2 n3 fk((R × R × R ) ) is R ⊗ R ⊗ R .

5. For every integer r between rmin and rmax there exists a ri such that ri = r. Proof. See [15].

The integers rmin, . . . , rmax are the typical ranks, so the theorem states that the smallest typical rank is equal to the generic rank over C. 28 Chapter 2. Tensor rank Chapter 3

Numerical methods and results

In this chapter we present three numerical methods for determining typical ranks of tensors. When one receives data from any kind of measurement it will always contain some random noise, i.e., a tensor received from measurements can be seen as having a random part. We know that a random matrix has the maximum rank with probability one, but for random higher order tensors the rank will be a typical rank. Therefore if one knows the typical ranks of a type of tensor one has just a few alternatives for its rank to explore in order to calculate a decomposition. The space of 3 × 3 × 4 tensors is the smallest tensor space where it is still unknown if there is more than one typical rank. Therefore this space has been used as a test object.

3.1 Comon, Ten Berge, Lathauwer and Castaing’s method

The method to calculate the generic rank or the smallest typical rank of tensor spaces described in this section was presented in [11]. The method uses the fact that the set of tensors of border rank at least R, denoted σR, is an irreducible variety. Therefore by deﬁnition 1.3.15, dim(σR) = dim(TˆxσR) − 1 for smooth points, x in σR. Since σR is smooth almost everywhere x can be generated randomly in σR. The generic rank is the ﬁrst rank where dim(σR) is equal to the dimension of the ambient space. To calculate dim(TˆxσR) we need ψ, which is a map from a given set of vectors (l) N N N {ur ∈ F l , 1 ≤ l ≤ L, 1 ≤ r} onto F 1 ⊗ · · · ⊗ F L as:

R (l) Nl X (1) (2) (L) {ur ∈ F , 1 ≤ l ≤ L, 1 ≤ r ≤ R} 7→ ur ⊗ ur ⊗ ... ⊗ ur . r=1

Then the dimension D of the closure of the image of ψ is dim(TˆxσR), which can be calculated as the rank of the Jacobian, JR, of ψ, expressed in any basis. This

Erdtman, J¨onsson,2012. 29 30 Chapter 3. Numerical methods and results gives us the lowest typical rank (or generic rank) as the last R for which the rank of the Jacobian matrix increases. The algorithm to use in practice is algorithm 2, seen below. To construct JR one needs to know the size of the tensors and if the tensors have any restrictions on them, one example of restriction being that the tensors are symmetric (see matrix (3.4) for the symmetric restriction and (3.1) for the case of no restriction).

To be able to write down Ji in a fairly simple way we need the Kronecker product. Given two matrices A of size I × J and B of size K × L the Kronecker product A ~ B is the IK × JL matrix deﬁned as

  a11B . . . a1J B  . .. .  A ~ B =  . . .  . aI1B . . . aIJ B

For a third order tensor with no restriction, such as symmetry or zero-mean of the vectors, ψ is the map

R N1 N2 N3 X {ar ∈ F , br ∈ F , cr ∈ F , r = 1,...,R} 7→ ar ⊗ br ⊗ cr. r=1

In a canonical basis, elements of im(ψ), say T , has the coordinate vector

R X T = ar ~ br ~ cr, r=1 where ar, br and cr are row vectors. Let us have an example to illustrate how the Jacobian matrix is constructed.

Example 3.1.1 (Jacobian matrix). Let T be a 2 × 2 × 2 tensor. Then the coordinates of T in a canonical basis are:

R(T ) R(T ) X X T =  ar(1)br(1)cr(1), ar(1)br(1)cr(2), r=1 r=1

R(T ) R(T ) X X ar(1)br(2)cr(1), ar(1)br(2)cr(2), r=1 r=1 R(T ) R(T ) X X ar(2)br(1)cr(1), ar(2)br(1)cr(2), r=1 r=1 R(T ) R(T )  X X ar(2)br(2)cr(1), ar(2)br(2)cr(2) , r=1 r=1

where ar(1) denotes the ﬁrst coordinate of vector ar. Let Ti denote the i:th 3.1. Comon, Ten Berge, Lathauwer and Castaing’s method 31 component then the Jacobian matrix is  ∂T1 ∂T2 ··· ∂T8  ∂a1(1) ∂a1(1) ∂a1(1)  ∂T1 ∂T2 ··· ∂T8   ∂a1(2) ∂a1(2) ∂a1(2)   ∂T1 ∂T2 ··· ∂T8   ∂b1(1) ∂b1(1) ∂b1(1)   ∂T ∂T ∂T   1 2 ··· 8   ∂b1(2) ∂b1(2) ∂b1(2)   ∂T1 ∂T2 ··· ∂T8   ∂c1(1) ∂c1(1) ∂c1(1)   ∂T1 ∂T2 ··· ∂T8   ∂c1(2) ∂c1(2) ∂c1(2)     . . . .  J =  . . . .  ,    ∂T1 ∂T2 ··· ∂T8   ∂ar (1) ∂ar (1) ∂ar (1)   ∂T1 ∂T2 ··· ∂T8   ∂ar (2) ∂ar (2) ∂ar (2)   ∂T1 ∂T2 ∂T8   ···   ∂br (1) ∂br (1) ∂br (1)   ∂T1 ∂T2 ··· ∂T8   ∂br (2) ∂br (2) ∂br (2)   ∂T1 ∂T2 ··· ∂T8   ∂cr (1) ∂cr (1) ∂cr (1)  ∂T1 ∂T2 ··· ∂T8 ∂cr (2) ∂cr (2) ∂cr (2) PR ∂T1 ∂ r=1 a1(1)b1(1)c1(1) where = = b1(1)c1(1). ∂a1(1) ∂a1(1)

In the more general case ar, br and cr are row vectors of lengths N1,N2 and N3 respectively. The Jacobian of ψ is after R iterations the following (N1 + N2 + N3)R × N1N2N3 matrix:   IN1 ~ b1 ~ c1  a1 ~ IN2 ~ c1     a1 ~ b1 ~ IN3     .   .    IN1 bi ci   ~ ~  J =  ai IN2 ci  . (3.1)  ~ ~   ai bi IN3   ~ ~   .   .    IN bR cR   1 ~ ~   aR ~ IN2 ~ cR  aR ~ bR ~ IN3 It is also possible to calculate the generic rank of tensors with more restrictions using this algorithm, such as symmetric tensors (3.2) or tensors that are symmetric in one slice (3.3).

R X ar ~ ar ~ ar, (3.2) r=1 R X ar ~ ar ~ br. (3.3) r=1 The symmetric tensor (3.2) gives rise to the following Jacobian matrix:   IN ~ a1 ~ a1 + a1 ~ IN ~ a1 + a1 ~ a1 ~ IN  .  J =  .  . (3.4) IN ~ ar ~ ar + ar ~ IN ~ ar + ar ~ ar ~ IN 32 Chapter 3. Numerical methods and results

Algorithm 2 Algorithm for Comon, Ten Berge, Lathauwer and Castaing’s method Require: N1,...,NL, (and the way to construct J speciﬁed somehow). R = 0. repeat R ← R + 1. (l) N Randomly generate the set UR+1 = {ur ∈ F l , 1 ≤ l ≤ L, 1 ≤ r ≤ R + 1}. Construct JR+1 from the set UR+1. until rank(JR) 6< rank(JR+1) return R.

3.1.1 Numerical results Results of the method (along with some theoretical results) for third order tensors are shown in tables 3.1, 3.2, 3.3 and 3.4. Here we have computed the lowest typical rank of N1 × N2 × N3 tensors where 2 ≤ N1 ≤ 5 and 2 ≤ N2,N3 ≤ 12. Table 3.5 presents typical rank of higher order tensors with all the dimensions of the vector spaces equal. All theoretically known results [38, 39, 40, 12, 15] are reported in plain face, and the previously known numerical results [11] are reported in bold face. Previously unreported results are in bold font and underlined. Our simulations coincide with all the previously known results. For all the numerical results it is unknown if there are more typical ranks than the calculated ones and for the theoretical results a dot marks the ones where it is unkown if there are more typical ranks. To compute the rank of a matrix is an easy task. In the 5 × 12 × 12 case the matrix reached size 783 × 720 before the rank stopped increasing and the whole computation took almost 4 seconds wall clock time. For comparison, the 2×10 case reached a matrix size of 6320×1024 and took roughly 39 seconds wall clock time. All computations were done on a computer with an Intel i7 with four cores (eight threads), eight megabytes of cache, running at 3.4 gigahertz, equipped with 4 gigabytes of internal memory running Linux. Mathematica was used for the computations.

3.1.2 Discussion Given enough time one can calculate the lowest typical rank for any tensor space using this method. There is still the problem that some tensor spaces over R have more than one typical rank. As we see in table 3.11 in the section 3.2, containing results from Choulakian’s method, it seems to be normal case to have a low probability for the lower typical rank. When one examines the tables with the typical ranks a question arises: Is it impossible for a tensor space to have a higher typical rank than the lowest typical rank of a tensor space of larger format? If this is the case, which is believable from the look of the tables, then it would lower the upper bound of many tensors typical rank. For example the 3 × 3 × 4 tensors would just have typical rank ﬁve because ﬁve is the lowest typical rank for both 3 × 3 × 4 and 3 × 3 × 5 tensors. 3.2. Choulakian’s method 33

Table 3.1: Known typical ranks for 2 × N2 × N3 arrays over R.

N3 2 3 4 5 6 7 8 9 10 11 12 2 2,3 3 4 4 4 4 4 4 4 4 4 3 3,4 4 5 6 6 6 6 6 6 6 4 4,5 5 6 7 8 8 8 8 8 5 5,6 6 7 8 9 10 10 10 6 6,7 7 8 9 10 11 12 N2 7 7,8 8 9 10 11 12 8 8,9 9 10 11 12 9 9,10 10 11 12 10 10,11 11 12 11 11,12 12 12 12,13

Table 3.2: Known typical ranks for 3 × N2 × N3 arrays over R.

N3 3 4 5 6 7 8 9 10 11 12 3 5 5. 5,6 6 7 8 9 9 9 9 4 6. 6 7 7,8. 8,9 9 10 11 12 5 8. 8 9 9 9,10 10 11 12 6 9. 9 10 11 11 11,12. 12,13 7 11. 11 12 12 13 13 N 2 8 12. 12 13 14 14 9 14. 14 15 15 10 15. 15 16 11 17. 17 12 18.

3.2 Choulakian’s method

This method was presented in [12]. Given an I ×J ×K tensor where I ≥ J ≥ K, we assume that it has rank I, i.e.

I X X = aα ⊗ bα ⊗ cα α=1

I and that {aα} forms a basis for R . The k:th slice Xk can then be expressed as

I X T T Xk = ckαaαbα = A diag(ck)B α=1 34 Chapter 3. Numerical methods and results

Table 3.3: Known typical ranks for 4 × N2 × N3 arrays over R.

N3 4 5 6 7 8 9 10 11 12 4 7. 8 8 9 10 10 10,11. 11,12. 12,13 5 9. 10 10 11 12 12 13 13 6 11. 12 12 13 14 14 15 7 13. 14 14 15 15 16 N2 8 15. 16 16 17 18 9 17. 18 18 19 10 19. 20 20 11 21. 22 12 23.

Table 3.4: Known typical ranks for 5 × N2 × N3 arrays over R.

N3 5 6 7 8 9 10 11 12 5 10. 11 12 13 14 14 15 15 6 12. 14 15 15 16 17 18 7 15 16 17 18 19 20 8 17 18 20 20 21 N 2 9 20 21 22 23 10 22 23 24 11 25 26 12 27

Table 3.5: Known typical ranks for N ×d arrays over R. d 2 3 4 5 6 7 8 9 10 11 12 2 2 2,3 4 6 10 16 29 52 94 171 316 N 3 3 5 9 23 57 146 - - - - -

where A = a1 a2 ... aI ,B = b1 b2 ... bI and C = (ckα). Since A is invertible, say A−1 = S, we can write

T SXk = diag(ck)B and if S has rows sα we can write this as

T sαXk = ckαbα k = 1, . . . , K α = 1,...,I. (3.5)

To see if the tensor X really has rank I, we want to ﬁnd one real solution to the system of equations (3.5). This is equivalent to ﬁnding at least I real roots 3.2. Choulakian’s method 35 to the system of equations

T sXk = ckb k = 1,...,K (3.6)

T where s = (s1, . . . , sI ) and b = (b1, . . . , bJ ) . If (3.6) does not have I real solutions then its rank cannot be I and because X is generic we know that its rank is at least I. In other words it will have a typical rank that is greater than I. Fixing c1 = 1, sI = 1 we reach

s(Xk − ckX1) = 0 k = 2,...,K (3.7) which is a collection of I(K − 1) polynomial equations of degree 2 in I + K − 2 variables. This can be solved by finding a Gröbnerbasis, which is a special kind of generating set for the ideal generated by the polynomials. If one cre- ates a Gröbnerbasis using a lexicographic order on the monomials, one gets a new system of equations where the variables are introduced one by one in the equations, so one can solve the first equation for one variable, use the values obtained in the next equation and solve for the second variable, and so on. See [8, 13] for more information on Gröbnerbases. Algorithm 3 illustrates how a program using Choulakian’s method would work.

Algorithm 3 Algorithm for carrying out Choulakian’s method.

Require: Tensor X of format I × J × K with slices Xi. Let s = (s1, s2, . . . , sI−1, 1) and c2, c3, . . . , cI be variables and let S be an empty set. for all Slices Xi do Insert the polynomials s(Xk − ckXI ) into the set S. end for Compute a Gr¨obnerbasis G from the set S using lexicographic order based on s1 < s2 < ··· < sI−1 < cI < cI−1 < ··· < c2. Count the number n of real solutions to the equation system G = 0. return n.

3.2.1 Numerical results We generated random tensors with integer coefficients between −100 and 100 for some types of arrays. The results are shown in tables 3.6, 3.7, 3.8, 3.9 and 3.10. The tables give approximative probabilities that a random real I × J × K tensor has rank I calculated as the percentage of tensors that gave rise to at least I roots, summarized in table 3.11. Thus these tensors must have more than one typical rank, and this rank must be greater than I. From the fifth point in theorem 2.4.1 we can draw the conclusion that I + 1 must be a typical rank. To compute a Gröbnerbasis is at worst a hard task, having exponential time complexity [8]. To run algorithm 3 for a thousand 5 × 3 × 3 tensors took little more than 3 seconds, which is a lot less than the almost 118 seconds it took for a thousand 7 × 4 × 3 tensors. All computations were done on a computer with 36 Chapter 3. Numerical methods and results an Intel i7 with four cores (eight threads), eight megabytes of cache, running at 3.4 gigahertz, equipped with 4 gigabytes of internal memory running Linux. Mathematica was used for the computations. Our simulations produce almost the same percentage as [12], less than one percentage point differs. Results for the 11 × 6 × 3 tensors are new results for this report.

Table 3.6: Number of real solutions to (3.7) for 10 000 random 5×3×3 tensors. Real roots 0 2 4 6 Number of tensors 413 5146 3686 755

Table 3.7: Number of real solutions to (3.7) for 10 000 random 7×4×3 tensors. Real roots 0 2 4 6 8 10 Number of tensors 160 2813 4620 1995 386 26

Table 3.8: Number of real solutions to (3.7) for 10 000 random 9×5×3 tensors. Real roots 1 3 5 7 9 11 13 Number of tensors 444 2910 3945 2086 543 65 7

Table 3.9: Number of real solutions to (3.7) for 10 000 random 10×4×4 tensors. Real roots 0 2 4 6 8 10 12 14 Number of tensors 45 806 2841 3564 1976 642 111 15

Table 3.10: Number of real solutions to (3.7) for 10 000 random 11 × 6 × 3 tensors. Real roots 1 3 5 7 9 11 13 15 Number of tensors 206 1663 3573 2975 1250 293 38 2

Table 3.11: Approximate probability that a random I × J × K tensor has rank I.

Format 5 × 3 × 3 7 × 4 × 3 9 × 5 × 3 10 × 4 × 4 11 × 6 × 3 Probability (%) 7.55 4.12 6.15 7.68 3.33 3.3. Surjectivity check 37

3.2.2 Discussion A question about this method is if it is possible to remove the constraint that the tensor has rank I, where I is the dimension of the largest vector space? Let T be an I × J × K tensor with typical rank P > I. Then the k:th slice is

P X T T Xk = ckαaαbα = A diag(ck)B , α=1 where A = a1 a2 ... aP ,B = b1 b2 ... bP and C = (ckα). A diﬀerence here is that we cannot use an inverse of A, or a pseudoinverse, to transform the system to a similar system of equations as (3.5). One way to work with the system is to assume that a part of A is invertible, i.e., A = A¯|A˜, where A¯ is an invertible matrix and A˜ consist of the rest of the columns of A and S = A¯−1. This gives us the possibility to lower the degree of most of the terms of the equations.

T T T SXk = I|SA˜ diag(ck)B = diag(c¯k)B¯ + SA˜ diag(c˜k)B˜ . These equations can be rewritten to a system of vector equations, as equation (3.5),

P T X T sαXk = ckαbα + ckia˜αibi , α = 1, . . . , I, k = 1,...,K. (3.8) i=I+1

PP T Here we can see that the only difference from (3.5) are the i=I+1 ckia˜αibi terms. These terms are troublesome, they make the whole system of equations dependent on each equation. The only dependency in (3.5) were the slices, so one could reduce them to (3.6) and search for more real solutions to fewer equations. One more thing to notice is that it is these terms that are the only third order terms in the system of equations, which might make it easier to calculate the Gröbnerbasis. After one day of work for Mathematica to solve the equations (3.8) for a 3 × 3 × 4 tensor with help of Gröbnerbasis no results where obtained. A conclusion from this is that to use this method for a bit larger tensors it is not enough to reduce the degree of most terms in the equations, there are too many equations.

3.3 Surjectivity check

The idea for this method has its origin in the fact that the tensor of the highest typical rank cover almost all of its ambient space. If one would find that tensors of a specific typical rank do not cover the whole space, then one knows that there must exist tensors of higher typical rank. So this method searches a tensor space for regions that do not contain any tensors of a specific typical rank. To see how the tensor space is covered we use a map defined as a scaling of the map fk from section 2.4: I J K r IJK−1 φr :(R × R × R ) → S , fr(a1, b1, c1,..., ar, br, cr) (3.9) φr(a1, b1, c1,..., ar, br, cr) = vec . kfr(a1, b1, c1,..., ar, br, cr)k 38 Chapter 3. Numerical methods and results

Here the norm is the Frobenius norm (see (1.5)) and SIJK−1 is the unit sphere in IJK-dimensional real space. This is done without loss of generality because of the fact that the rank is invariant during scaling, i.e., R(λT ) = R(T ), for λ ∈ R \{0} and T ∈ RI ⊗ RJ ⊗ RK . From the view of real algebraic geometry, with the set up as for theorem I J K rj 2.4.1, we have that for a typical rank rj the closure of frj ((R × R × R ) ) is I J K R ⊗R ⊗R only if rj = rmax. If rj < rmax then there will be a semi-algebraic set with maximal dimension in RI ⊗ RJ ⊗ RK that is not in the closure of I J K rj frj ((R × R × R ) ). In the case with φr, where r is a given typical rank, IJK−1 this can be seen as: If im(φr) does not cover the whole of S , then there exist a typical rank greater than r. The idea is to ﬁnd a large enough ”area” of the sphere where there are no points from im(φr). To ﬁnd such areas we generated control points/tensors in SIJK−1 and calculated how far away the closest point for a set of randomly generated points in im(φr) was.

Area on the n-sphere

The surface area An of a sphere with radius r in the n-dimensional Euclidean space is well-known as n 2π 2 n−1 An(r) = n r . Γ 2 To get a fraction of the area of the n-sphere which depends on the Euclidean distance from a point we use the area of a spherical cap [28], which can be calculated as cap 1 n − 1 1 A (r) = A (r)I 2 , , n 2 n sin α 2 2 where Ix(a, b) is the regularized incomplete beta function. It is deﬁned through the beta function, B(a, b), and the incomplete beta function, B(x; a, b) as

B(x; a, b) I (a, b) = . x B(a, b)

To get the area of a spherical cap depending on the Euclidean distance d instead of the angle α, we use a bit of rewriting, using ﬁgure 3.1, ending in:

4 d2 − d sin2 α = 4r2 . r2

A fraction of the area of an n-sphere can then be calculated as

cap An (r) 1 n − 1 1 = I 4 , d2− d r2 An(r) 2 4r2 2 2 and for the unit sphere, r = 1, in n dimensions, the fraction is

cap An (r) 1 n − 1 1 4 = Id2− d , . An(r) 2 4 2 2 3.3. Surjectivity check 39

Figure 3.1: Connection between Euclidean distance and an angle on a 2- dimensional intersection of a sphere.

The hyperdeterminant The hyperdeterminant is a generalization of the determinant for matrices, see e.g. [16] for details. We take a brief look at the hyperdeterminant for 2 × 2 × 2 tensors because its value gives a good connection with the rank. For higher dimensions and order of tensors it is not as easy to calculate. Given a tensor

t t t t T = 111 121 112 122 . t211 t221 t212 t222 the hyperdeterminant can be computed as

2 2 2 2 2 2 2 2 Hdet(T ) =(t111t222 + t112t221 + t121t212 + t211t122)−

2(t111t112t221t222 + t111t121t212t222 + t111t122t211t222+

t112t121t212t221 + t112t122t221t211 + t121t122t212t211)+

4(t111t122t212t221 + t112t121t211t222).

We make use of the hyperdeterminant by examining its sign. If Hdet(T ) < 0 then R(T ) = 3, if Hdet(T ) > 0 then R(T ) = 2 and if Hdet(T ) = 0 it is not possible to say anyting about the rank, it can be 0, 1, 2 or 3. The case where the hyperdeterminant is zero will not be a problem since randomly generated tensors will have non-zero hyperdeterminant with probability one.

3.3.1 Results For our simulations we used 2000 control points/tensors that were generated with a Gaussian distribution with zero mean and standard deviation one. The points were generated in the spaces R8, R12, R18 and R36 and then projected on the unit spheres. For each run we calculated the distance from the control 6 points to 2 · 10 test points which where generated through the map φr with random values, from a Gaussian distribution, of (RI × RJ × RK )r and then projected on the unit spheres. 40 Chapter 3. Numerical methods and results

For the 2 × 2 × 2 tensors it took almost 13.5 minutes and for 3 × 3 × 4 tensors about 33 minutes. All computations were done on a computer with an Intel i7 with four cores (eight threads), eight megabytes of cache, running at 3.4 gigahertz, equipped with 4 gigabytes of internal memory running Linux. Matlab was used for the computations. Table 3.12 shows the distance from a point on the n-sphere to get a certain fraction of the area. In tables 3.13, 3.14, 3.15, 3.16 each column represents a control point and on each row is the amount of test points inside the given area. The control points in the table where chosen as those with the largest and the smallest minimal distance to test points, and for 2 × 2 × 2 tensors we added two more where the hyperdeterminant where close to zero. The rank of the 2×3×3 tensors was calculated with the method used in [36], by checking the eigenvalues −1 of X1X2 where X1 and X2 are the two 3 × 3 slices. Finally the distance row denotes the distance to the closest test point.

Table 3.12: Euclidean distances depending on the fraction of the area on the n-sphere.

5 · 10−3 1 · 10−3 5 · 10−4 1 · 10−4 5 · 10−5 1 · 10−5 5 · 10−6 1 · 10−6 S7 0.6355 0.5000 0.4511 0.3565 0.3225 0.2555 - - S11 - - 0.6310 0.5400 0.5051 0.4337 0.4065 0.3498 S17 - - 0.7835 0.7035 0.6720 0.6060 0.5796 0.5239 S35 - - 0.9810 0.9225 0.8991 0.8486 0.8282 0.7837

Table 3.13: Number of points from φ2 close to some control points for the 2 × 2 × 2 tensor.

5 · 10−3 439 399 1866 1822 26814 19403 1 · 10−3 8 2 287 256 9996 4810 5 · 10−4 0 0 105 139 6327 2312 1 · 10−4 0 0 21 21 2221 309 5 · 10−5 0 0 10 17 1365 155 1 · 10−5 0 0 3 1 354 24 distance 0.4691 0.4764 0.2148 0.2149 0.1152 0.1521 Hyperdeterminant -0.2106 -0.2175 0.2461 0.2432 0.0109 -0.0030

3.3.2 Discussion To find out if this method could work we started to analyze 2 × 2 × 2 tensors. Table 3.13 indicates, by the big difference between the rank 3 tensors with high negative value on the hyperdeterminant and all the other tensors, that it might be possible to find out if there is a tensor of higher rank than the lowest typical rank. Furthermore we need to know how the tensors with a single typical rank 3.3. Surjectivity check 41

Table 3.14: Number of points from φ3 close to some control points for the 2 × 2 × 3 tensor.

5 · 10−4 60 94 52 4169 5344 5860 1 · 10−4 13 9 2 1303 1923 2179 5 · 10−5 0 3 1 705 1201 1404 1 · 10−5 0 0 1 158 395 478 5 · 10−6 0 0 0 99 226 303 1 · 10−6 0 0 0 20 47 92 distance 0.4646 0.4791 0.4821 0.2262 0.2152 0.2219

Table 3.15: Number of points from φ3 close to some control points for the 2 × 3 × 3 tensor.

5 · 10−4 96 97 62 726 5519 990 1 · 10−4 8 5 3 130 2166 217 5 · 10−5 3 0 1 70 1409 112 1 · 10−5 0 0 0 13 511 23 5 · 10−6 0 0 0 6 321 12 1 · 10−6 0 0 0 1 115 3 distance 0.6658 0.6759 0.6619 0.3587 0.3144 0.3003 rank 4 4 3 4 3 3

Table 3.16: Number of points from φ5 close to some control points for the 3 × 3 × 4 tensor.

5 · 10−4 361 351 417 1529 4884 2324 1 · 10−4 40 33 51 377 1873 626 5 · 10−5 20 16 19 205 1382 381 1 · 10−5 0 1 2 49 612 88 5 · 10−6 0 0 1 18 435 63 1 · 10−6 0 0 0 3 169 11 distance 0.8473 0.8401 0.8386 0.6791 0.5460 0.6741

behaves over the mapping on the sphere. The smallest possible space is the 2 × 2 × 3 tensor space for which we can see the results in table 3.14. The results from the 2 × 2 × 3 tensor space have a simular structure as for the 2 × 2 × 2 tensors, but there is a magnitude in diﬀerence in the sizes of the fractions of the sphere for which there are no test points close to the control points. The next step was to check the method with tensors with two known typical ranks, the next smallest is 2 × 3 × 3 tensors which have typical ranks 3 and 42 Chapter 3. Numerical methods and results

4. Some results are shown in table 3.15, the result look more or less the same as the result for the 2 × 2 × 3 tensors. From these results we can not see a structural difference between a tensor space with one typical rank and one with two typical ranks. An interesting result, is that there exist control points with rank 3 with, more or less, the same empty area around it as the ”best” rank 4 control points (see table 3.15). This occurrence is not just an isolated point, from the 99 points with the largest empty area, there where 56 points of rank 3 and 43 of rank 4. A general problem with this method is that even if we would have had a significant difference between the tensors with one typical rank and those with two typical ranks, it would still end up in the question on how one can decide the rank of a tensor. The method would have given an indication of tensors that had higher probability of having another rank than the lowest typical rank, but not more. A more specific problem can be seen in table 3.14 with the control points having very different amounts of test points close to them. This shows us that the map φ with Gaussian distributed values dose not evenly spread out the test points over the sphere, which makes it harder to say how the sphere is covered. We also need to mention that this method is affected by the so called ”curse of dimensionality” [32]. This tells us, among other things, that the number of samples needed to cover an area with a specific distance increases dramatically with the increase of dimension. All in all as the method is right now we cannot say anything about the typical ranks of tensors. Chapter 4

Tensors over ﬁnite ﬁelds

Tensors over finite fields have not been very well explored. Among the few results which have been published is the result by H˚astad[18] that determining the rank of a tensor over a finite field is NP-complete. Applications of tensors over finite fields in coding theory have also been explored [34]. In [27] a method for computing a lower bound for the rank of tensors over finite fields is presented. 2 2 2 In this chapter the GLq(2) × GLq(2) × GLq(2)-orbits of Fq ⊗ Fq ⊗ Fq will be calculated. Previously orbit results have been published for R2 ⊗R2 ⊗R2 in [14, p. 1115], and C2 ⊗ C2 ⊗ C2 for which an overiew can be found in [25, section 10.3]. It is known that some tensors over R have lower rank when considered as tensors over C [23, p. 9] and we show an analogous result over Fq. Some results on symmetric tensor rank will also be considered, showing that Comon’s conjecture, 2.1.6, fails over at least some finite fields. We start the chapter with some basic results on finite fields. The reader unfamiliar with finite fields can consult any book on basic abstract algebra, or [30] for a more specialized reference.

4.1 Finite ﬁelds and linear algebra

There exists, up to isomorphism, exactly one finite field for every prime power n q = p . For primes p the field Fp can be constructed as the quotient ring Z/pZ. Fields Fpn can be constructed as quotient rings in Fp[x]. If q(x) ∈ Fp[x] is an irreducible polynomial of degree n, the quotient ring

Fp[x] hq(x)i will be a ﬁeld with pn elements. One can note that all ﬁelds Fpn have characteristic p, i.e. if one adds the same element to itself p times, one gets zero: x + x + ··· + x = 0. | {z } p terms

× Another thing to note is that the multiplicative group Fq is a cyclic group, i.e. there exists an element a such that × 2 q−1 Fq = {a, a , . . . , a }

Erdtman, Jönsson,2012. 43 44 Chapter 4. Tensors over finite fields

q and for all x ∈ Fq we have x = x (for the zero element this holds trivially). GLq(n) denotes the general linear group over Fq, the group of n×n invertible matrices over Fq. The special linear group, SLq(n), is the group of matrices whose determinant is one.

Theorem 4.1.1 (Size of GLq(n)). The number of invertible n×n matrices over Fq is n−1 Y (qn − qi). i=0

n Proof. Picking an n × n invertible matrix is equivalent to picking a basis in Fq . Pick the first vector in qn − 1 ways (any vector except the zero vector), pick the second vector in qn − q ways (any vector except those parallel to the first), pick the third vector in qn − q2 ways (any vector except those which are a linear combination of the two first), and so on. When one has picked n vectors, one gets the product in the theorem.

Theorem 4.1.2 (Size of SLq(n)). The size of SLq(n) is

n−1 1 Y (qn − qi). q − 1 i=0

× Proof. The determinant is a group homomorphism from GLq(n) to Fq . The ﬁrst isomorphism theorem for groups states that

GL (n) q =∼ × ker det Fq

× and using ker det = SLq(n) and |Fq | = q − 1, the result is reached.

2 2 2 4.2 GLq(2) × GLq(2) × GLq(2)-orbits of Fq ⊗ Fq ⊗ Fq

2 2 2 One can explore the orbits of Fq ⊗ Fq ⊗ Fq under the action of G := GLq(2) × GLq(2) × GLq(2) (see section 1.1) for small q with simple computer programs (algorithm 4 and 5) and construct table 4.1 below. Note that the table displays 2 2 2 2 2 2 the orbits of F2 ⊗ F2 ⊗ F2 and F3 ⊗ F3 ⊗ F3 - it seems reasonable that all 2 2 2 Fq ⊗ Fq ⊗ Fq have analogous structure. We will now prove this.

2 2 2 Theorem 4.2.1. There are eight orbits of Fq ⊗ Fq ⊗ Fq and the normal forms, with the exception of O3,2, can be taken as in table 4.1.

Proof. This proof is adapted from [14, theorem 7.1]. Let A = (A1|A2) be a 2 × 2 × 2 tensor and let A1,A2 be its slices. Assume rank A1 = 0. If rank A2 = 0, then A is the zero tensor and belongs to orbit O0. If rank A2 = 1 we can transform A to the normal form of O1. If rank A2 = 2 we can transfrom A to the normal form of O2,1. Assume rank A1 = 1, so we can assume

0 0 a b A = 0 1 c d 2 2 2 4.2. GLq(2) × GLq(2) × GLq(2)-orbits of Fq ⊗ Fq ⊗ Fq 45

n1 n2 n3 Algorithm 4 Program for computing the tensors over Fq ⊗ Fq ⊗ Fq sorted by rank. Require: Fq, n1, n2, n3. Let Ri be empty sets. n1 n2 n3 for all v1 ∈ Fq , v2 ∈ Fq , v3 ∈ Fq , v1, v2, v3 6= 0 do Insert v1 ⊗ v2 ⊗ v3 in the set R1. end for i ← 1. repeat i ← i + 1. for all T ∈ Ri−1 do for all S ∈ R1 do Insert T + S in Ri. end for end for Si−1 Ri ← Ri \ k=1 Rk Pi n1n2n3 until k=1 |Ri| = q − 1 return (R1,R2,...,Ri).

Algorithm 5 Program for separating a list of tensors (R1,R2,...Rr), where Ri is all the tensors of rank i into orbits of a group G. Require: (R1,...,Rr),G (and the action of G specified somehow.) Let Oi,j be empty sets. for i = 1 to r do j ← 1. while Ri 6= ∅ do Pick an element T ∈ Ri. for all g ∈ G do Insert g · T in the set Oi,j. end for Ri ← Ri \ Oi,j. j ← j + 1. end while end for return ((O1,1,O1,2,... ), (O2,1,O2,2,... ),..., (Or,1,Or,2,... )). 46 Chapter 4. Tensors over finite fields

2 2 2 Table 4.1: Orbits of Fq ⊗Fq ⊗Fq under the action of GLq(2)×GLq(2)×GLq(2) for q = 2, 3.

Orbit Normal form Rank Multilinear rank F2 size F3 size 0 0 0 0 O 0 (0, 0, 0) 1 1 0 0 0 0 0 0 0 0 0 O 1 (1, 1, 1) 27 128 1 0 0 0 1 0 0 0 1 O 2 (1, 2, 2) 18 192 2,1 0 0 1 0 0 0 0 0 O 2 (2, 1, 2) 18 192 2,2 0 1 1 0 0 0 0 1 O 2 (2, 2, 1) 18 192 2,3 0 1 0 0 0 0 1 0 O 2 (2, 2, 2) 108 3456 2,B 0 1 0 0 0 0 0 1 O 3 (2, 2, 2) 54 1536 3,1 0 1 1 0 0 1 1 1 O 3 (2, 2, 2) 12 864 3,2 1 0 0 1

and if a 6= 0 we transform A to the normal form of O2,B: 0 0 a b 0 0 1 0 ∼ . 0 1 c d 0 1 0 0 If a = 0 we get 0 0 0 b 0 0 0 b ∼ 0 1 c d 0 1 c 0 which can be transformed into, depending on the values of b, c: 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 1 1 0 0 1 1 0 which are in O1,O2,3,O2,2,O3,1 respectively. Assume rank A1 = 2 so we may assume that 1 0 × × A = , 0 1 × × now we have two cases, eihter A2 is diagonalizable or it is not. Either way, we can now use an element of the form (I,T,T −1) to leave the ﬁrst slice the same −1 and transform the second slice A2 to TA2T . If A2 is diagonalizable and has two equal eigenvales we get 1 0 λ 0 1 0 0 0 ∼ 0 1 0 λ 0 1 0 0 which is in O2,1. If A2 is diagonalizable and has two distinct eigenvalues λ1, λ2, we get 1 0 λ 0 1 0 0 0 1 ∼ 0 1 0 λ2 0 0 0 1 2 2 2 4.2. GLq(2) × GLq(2) × GLq(2)-orbits of Fq ⊗ Fq ⊗ Fq 47

which is in O2,B. In the ﬁnal case, where rank A1 = 2 and A2 is not diagonalizable, we can use the rational canonical form for A2 to get

1 0 0 −p A ∼ 0 0 1 1 −p1

2 where p(x) = x + p1x + p0 is the characteristic polynomial for A2. p(x) must be either irreducible or have a double root (which has to be non-zero). Using the group element

p 0 1 0 p 0 0 , p0 , 0 0 1 0 1 0 1 p0 this can be transformed to p 0 0 −1 p 0 p −1 0 ∼ 0 1 0 1 1 − p1 0 1 1 0 p0

In the case of p(x) having a double root we get p(x) = (x − λ)2 = x2 − 2λx + λ2 2 so p1 = −2λ and p0 = λ and thus the above tensor becomes.

λ2 0 −2λ −1 . 0 1 1 0

If the ﬁeld has characteristic two one gets

λ2 0 0 1 0 1 1 0 which is in O3,1. If the ﬁeld characteristic is not two it can be transformed into

λ 0 −2 −1 0 λ 1 0 through 1 λ 0 1 0 1 0 1 , , 0 λ 0 λ 0 λ and this tensor is in O3,1. Now, assume that p(x) is irreducible. Then p(x) has two distinct roots in q Fq2 which can be written α, α (special case of [30, page 52, theorem 2.14]). Thus, through the action of GLq2 (2)×GLq2 (2)×GLq2 (2) we can transform our tensor to 1 0 α 0 A ∼ 0 1 0 αq

−1 with an element on the form (I,T,T ). Now, all elements which are in Fq2 but not in Fq can be written c1 + c2α for c1 ∈ Fq, c2 ∈ Fq \{0}, so if B is another tensor where the first slice is the identity matrix and the second slice has a characteristic polynomial which is irreducible in Fq, it can be transfered to 1 0 c1 + c2α 0 B ∼ q 0 1 0 c1 + c2α 48 Chapter 4. Tensors over finite fields through an element on the form (I, S, S−1), since the second slice’s eigenvalues q q q q q can be taken to be c1 + c2α and (c1 + c2α) = c1 + c2α = c1 + c2α . Now, we can transfer 1 0 α 0 1 0 c1 + c2α 0 q ∼ q 0 1 0 α 0 1 0 c1 + c2α with the element 1 c1 1 0 1 0 (K,I,I) = , ∈ GLq(2) × GLq(2) × GLq(2) 0 c2 0 1 0 1

Thus (I,T −1,T )·(K,I,I)·(I,T,T −1)·A will have a second slice having the same characteristic polynomial as B, and thus A ∼ B through GLq2 (2) × GLq2 (2) × GLq2 (2), but the GLq2 (2)×GLq2 (2)×GLq2 (2)-elements cancel each other out so we get (K,I,I)·A ∼ B, or just A ∼ B through GLq(2)×GLq(2)×GLq(2). Thus the tensors where the second slice has an irreducible characteristic polynomial when the ﬁrst slice is reduced to the identity matrix form an orbit, O3,2. Next we will calculate the sizes of the orbits.

4.2.1 Rank zero and rank one orbits 2 2 2 First of all, there is exactly one zero tensor in every Fq ⊗ Fq ⊗ Fq, and it is the only tensor with rank zero. Consider the rank one tensor a ⊗ b ⊗ c, which can be transformed into any other rank one tensor a0 ⊗ b0 ⊗ c0 using 0 0 0 G, simply by picking (g1, g2, g3) ∈ G with g1(a) = a , g2(b) = b , g3(c) = c . Thus the rank one tensors form one orbit, O1. To count the size of the orbit, we use the Orbit-Stabilizer theorem, |Gx| = |G|/|Gx| where Gx is the orbit of x and Gx is the stabilizer of x. Now, if g = (g1, g2, g3) ∈ G stabilizes x = a ⊗ b ⊗ c, we have g1(a) = αa, g2(b) = βb, g3(c) = γc with αβγ = 1 and, for a2, b2, c2 which are linearly independent to a, b, c respectively, we can pick the images g1(a2), g2(b2), g3(c2) to be any vectors not linearly dependent to a, b, c respectively. We can choose the triple (α, β, γ) in (q − 1)2 ways (pick α, β to be any non-zero scalars, this determines γ) and we can pick the triple 2 3 (g1(a2), g2(b2), g3(c2)) in (q − q) ways. This determines g uniquely. Since |G| = (q2 − 1)3(q2 − q)3 we have

(q2 − 1)3(q2 − q)3 (q + 1)3(q − 1)3 |O | = = = (q − 1)(q + 1)3 (4.1) 1 (q − 1)2(q2 − q)3 (q − 1)2

Moreover, since we can pick a, b, c in q2 − 1 ways each, this gives us that every rank one tensor a ⊗ b ⊗ c can be expressed in

(q2 − 1)3 = (q − 1)2 (q − 1)(q + 1)3 ways.

4.2.2 Rank two orbits Continuing to the tensors of rank two, we have four diﬀerent orbits, we call these O2,1,O2,2,O2,3 and O2,B. In the description below {a1, a2}, {b1, b2} and 2 2 2 4.2. GLq(2) × GLq(2) × GLq(2)-orbits of Fq ⊗ Fq ⊗ Fq 49

{c1, c2} are linearly independent sets.

O2,1 :a1 ⊗ b1 ⊗ c1 + a1 ⊗ b2 ⊗ c2

O2,2 :a1 ⊗ b1 ⊗ c1 + a2 ⊗ b1 ⊗ c2

O2,3 :a1 ⊗ b1 ⊗ c1 + a2 ⊗ b2 ⊗ c1

O2,B :a1 ⊗ b1 ⊗ c1 + a2 ⊗ b2 ⊗ c2

Thus, O2,i for i = 1, 2, 3 consists of tensors where the i:th factor in both terms are the same and O2,B is the set of tensors where the set of i:th factors form a 2 basis for Fq, for i = 1, 2, 3.

The O2,i orbits

Let us consider O2,1 (the cases O2,2 and O2,3 will be analogous), write a1 ⊗ b1 ⊗ c1 + a1 ⊗ b2 ⊗ c2 = a1 ⊗ (b1 ⊗ c1 + b2 ⊗ c2) and consider to start with 1 2 only the action of (g2, g3) on b1 ⊗ c1 + b2 ⊗ c2. Say g2(bi) = βi b1 + βi b2 and 1 2 g3(ci) = γi c1 + γi c2, so we get the equation

2 2 X X i j k(b1 ⊗ c1 + b2 ⊗ c2) = (g2, g3) · (b1 ⊗ c1 + b2 ⊗ c2) = βkγkbi ⊗ cj i,j=1 k=1 which is equivalent to four determinantial equations, det A1 = det A2 = k and det B1 = det B2 = 0, where

1 1 2 2 β1 −γ2 β1 −γ2 A1 = 1 1 A2 = 2 2 β2 γ1 β2 γ1 1 2 2 1 β1 −γ2 β1 −γ2 B1 = 1 2 B2 = 2 1 β2 γ1 β2 γ1

To start, pick k in q − 1 diﬀerent ways. Then pick A1 to satisfy det A1 = k, which can be done in (q−1)q(q+1) ways (size of SL2(q)). To ensure det B1 = 0, 2 1 −γ2 β1 pick the row vector 2 to be a multiple of 2 , which can be done in γ1 β1 q − 1 ways. Then the equations det A2 = k and det B2 = 0 determines all the 3 other variables. Thus we can pick (g2, g3) in (q − 1) q(q + 1) ways. −1 Now, consider the action of g1. g1(a1) = k a1 is ﬁxed, but if a2 is linearly independent to a1, we can pick g1(a2) in q(q − 1) ways, and this determines g1 completely. Thus, by the orbit-stabilizer theorem:

(q2 − q)3(q2 − 1)3 |O | = = (q − 1)2q(q + 1)2 (4.2) 2,i (q − 1)4q2(q + 1) for i = 1, 2, 3.

The O2,B orbit

The size of O2,B is a bit easier to compute. A tensor in O2,B can be written a1 ⊗ b1 ⊗ c1 + a2 ⊗ b2 ⊗ c2. First pick the rank one tensor a1 ⊗ b1 ⊗ c1, 3 which can be done in (q − 1)(q + 1) ways. Next, pick vectors a2, b2, c2 linearly 3 3 independent of a1, b1, c1 respectively, which can be done in (q −1) q ways, but 50 Chapter 4. Tensors over finite fields for every rank one tensor there are (q − 1)2 different expressions for it, so we can pick a2 ⊗ b2 ⊗ c2 in q3(q − 1)3 = q3(q − 1) (q − 1)2 ways. Using this approach the whole tensor a1 ⊗ b1 ⊗ c1 + a2 ⊗ b2 ⊗ c2 can be picked in two different ways, so the size of the orbit is 1 1 |O | = (q + 1)3(q − 1)q3(q − 1) = (q − 1)2q3(q + 1)3. (4.3) 2,B 2 2 4.2.3 Rank three orbits And so, we continue on to the orbits with rank three, and the first of these can be described as

O3,1 :a1 ⊗ b1 ⊗ c1 + a2 ⊗ b1 ⊗ c2 + a2 ⊗ b2 ⊗ c1

but we will not give a description of a tensor in O3,2.

The O3,1 orbit

The number of g = (g1, g2, g3) ∈ G which ﬁxes a1 ⊗b1 ⊗c1 +a2 ⊗b1 ⊗c2 +a2 ⊗ P i b2 ⊗ c1 can be determined in the following way: Say g1(al) = i αlai, g2(bl) = P j P k j βl bj, g3(cl) = k γl ck. Then

g·(a1 ⊗ b1 ⊗ c1 + a2 ⊗ b1 ⊗ c2 + a2 ⊗ b2 ⊗ c1) = X i j k i j k i j k = (α1β1γ1 + α2β1γ2 + α2β2γ1 )ai ⊗ bj ⊗ ck = i,j,k

= a1 ⊗ b1 ⊗ c1 + a2 ⊗ b1 ⊗ c2 + a2 ⊗ b2 ⊗ c1 can be written as eight determinantial equations:

αi βj −γk ( 1 2 2 1 (i, j, k) ∈ {(1, 1, 1), (2, 1, 2), (2, 2, 1)} j k det Aijk = det  0 β1 γ1  = . i k 0 else α2 0 γ1

1 2 All of the matrices Aijk contain some combination of six column vectors α , α , 1 2 1 2 i j k β , β , γ , γ with Aijk = α β γ . The conditions det A211 = det A121 = 2 2 2 1 1 1 1 1 1 det A112 = 0 tell us that α , β , γ are in hβ , γ i, hα , γ i and hβ , γ i respectively. Since the vectors have speciﬁc formats, we can even deduce that:

 1 1 −1 1 1  1 1 −1 1 1  1 1 −1 1 1 β2 + (γ1 ) β1 γ2 α1 + (γ1 ) α2γ2 α1 + (β1 ) α2β2 2 2 1 2 1 α = t  0  , β = s  −α2  , γ = u  α2  1 1 −β1 0 α2 for parameters t, s, u. Using this in calculating det A122 we get:   (α1)2 0 = det A = su 2 α1β1γ1 + α1β1γ1 + α1β1γ1 122 1 1  1 1 1 2 2 1 2 1 2  β1 γ1 | {z } =det A111=1 2 2 2 4.2. GLq(2) × GLq(2) × GLq(2)-orbits of Fq ⊗ Fq ⊗ Fq 51

1 from which we deduce α2 = 0, which inserted into det A222 makes the expression zero. Continuing on, we get: 1 1 1 1 = det A221 = −stα1β1 γ1 1 1 1 = det A212 = tuα1β1 so if we pick A111 and then pick one of the parameters, e.g. t, we have completely 1 determined the system. Remember that since α2 = 0, we have  1 1 1 α1 β2 −γ2 1 1 A111 =  0 β1 γ1  1 0 0 γ1 1 1 1 thus we pick α1 and β1 to be any non-zero value and take γ1 to make det A111 = 1 1 1 and then pick β2 and γ2 arbitrarily. Last, pick t to be any non-zero value. This gives a total of (q − 1)3q2 possibilites, yielding: (q2 − q)3(q2 − 1)3 |O | = = (q − 1)3q(q + 1)3 3,1 (q − 1)3q2

The O3,2 orbit From theorem 4.2.1 we get that the orbits make up the whole space, so we must 8 have |O3,2| = q − |O0| − |O1| − |O2,1| − |O2,2| − |O2,3| − |O2,B| − |O3,1| and thus 1 |O | = (q − 1)4q3(q + 1). (4.4) 3,2 2 However, the result can be achieved in another way. From theorem 4.2.1 we see that the elements of O3,2 are tensors such that when the ﬁrst slice, which is of rank 2, is reduced to the identity matrix, the second slice is a matrix which 1 2 has a monic irreducible characteristic polynomial. Now, there are 2 (q − q) monic irreducible polynomials of degree two over Fq (special case of [30, page 93, theorem 3.25]), there are (q − 1)q matrices which have this polynomial as characteristic polynomial and there are (q2 − 1)(q2 − q) rank two matrices. So, 1 2 2 2 1 4 3 in total there are 2 (q − q)(q − 1)q(q − 1)(q − q) = 2 (q − 1) q (q + 1) tensors in O3,2.

4.2.4 Main result The preceeding subsections give us the following theorem: 2 2 2 Theorem 4.2.2. The GLq(2) × GLq(2) × GLq(2)-orbits of Fq ⊗ Fq ⊗ Fq can be characterized as in table 4.2.

2 2 2 3 Corollary 4.2.3. In Fq⊗Fq⊗Fq the number of rank one tensors is (q−1)(q+1) , 1 2 2 3 2 the number of rank two tensors is 2 (q −1) q(q +1) (q +q +6) and the number 1 3 3 2 of rank three tensors is 2 (q − 1) q(q + 1)(q + q + 4q + 2).

Remark 4.2.4. We have Fp ⊂ Fpk ⊂ Fp2k ⊂ Fp3k ⊂ · · · and the direct limit nk of this is an infinite field which is the algebraic closure of Fp. If we set q = p and look at the quotient of the orbit sizes with the size of the tensor space, q8, we see that the only orbits with non-zero limits n → ∞ of quotients are O2,B 1 and O3,2 (the limits of the quotients are both 2 ) - which happen to correspond to the orbits in R2 ⊗ R2 ⊗ R2 which contain all the typical tensors. 52 Chapter 4. Tensors over finite fields

2 2 2 Table 4.2: Orbits of Fq ⊗Fq ⊗Fq under the action of GLq(2)×GLq(2)×GLq(2).

Orbit Normal form Rank Multilinear rank Size O0 0 ⊗ 0 ⊗ 0 0 (0, 0, 0) 1 3 O1 a1 ⊗ b1 ⊗ c1 1 (1, 1, 1) (q − 1)(q + 1) 2 2 O2,1 a1 ⊗ b1 ⊗ c1 + a1 ⊗ b2 ⊗ c2 2 (1, 2, 2) (q − 1) q(q + 1) 2 2 O2,2 a1 ⊗ b1 ⊗ c1 + a2 ⊗ b1 ⊗ c2 2 (2, 1, 2) (q − 1) q(q + 1) 2 2 O2,3 a1 ⊗ b1 ⊗ c1 + a2 ⊗ b2 ⊗ c1 2 (2, 2, 1) (q − 1) q(q + 1) 1 2 3 3 O2,B a1 ⊗ b1 ⊗ c1 + a2 ⊗ b2 ⊗ c2 2 (2, 2, 2) 2 (q − 1) q (q + 1) 3 3 O3,1 a1 ⊗ b1 ⊗ c1 + a2 ⊗ b1 ⊗ c2 + 3 (2, 2, 2) (q − 1) q(q + 1) a2 ⊗ b2 ⊗ c1 1 4 3 O3,2 - 3 (2, 2, 2) 2 (q − 1) q (q + 1)

4.3 Lower rank over ﬁeld extensions

In theorem 4.2.1 it was stated that one can not take the normal form of O3,2 to be the same over all ﬁnite ﬁelds. The explanation is given by the following corollary:

Corollary 4.3.1. All tensors in the orbit O3,2 over Fq is in O2,B when considered over Fq2 . In particular, they have rank three over Fq but rank two over Fq2 .

Proof. From the proof of theorem 4.2.1 we know that the orbit O3,2 is made up of tensors whose ﬁrst slice is of rank two, and when reduced to the identity matrix, the second slice has an irreducible characteristic polynomial. However, all second degree irreducible polynomials over Fq have two simple roots over Fq2 , so according to the proof they will be in O2,B over Fq2 .

Here is a concrete example for F2. Let [x] = F2 = {x + αy : x, y ∈ , α2 = α + 1} F4 hx2 + x + 1i F2 and take the normal form tensor of O3,2 from table 4.1 which then can be written as 0 1 1 1 α 1 1 + α 1 + α 1 α = ⊗ ⊗ + ⊗ ⊗ 1 0 0 1 1 + α 1 + α α α α 1 + α

2 2 2 and thus every tensor in O3,2 of F2 ⊗ F2 ⊗ F2 has rank two over F4.

4.4 Symmetric rank

Using a computer program (algorithm 6), the symmetric ranks of the symmetric 2 × 2 × 2 tensors which lie in the linear span of the symmetric rank one tensors over some Fp have been calculated, see table 4.3. Also, the number of N ×N ×N symmetric tensors which lie in the linear span of the symmetric rank one tensors 4.4. Symmetric rank 53

Algorithm 6 Algorithm to ﬁnd all symmetric tensors generated by symmetric n n n rank one tensors in Fq ⊗ Fq ⊗ Fq , sorted by rank. Require: Fq, n. Let Ri be empty sets. n for all v ∈ Fq \{0} do Insert v ⊗ v ⊗ v in the set R1. end for i ← 1. repeat i ← i + 1. for all T ∈ Ri−1 do for all S ∈ R1 do Insert T + S in the set Ri. end for end for Si−1 Ri = Ri \ k=1 Rk. until Ri = ∅ return (R1,R2,...,Ri−1).

Table 4.3: Number of symmetric 2 × 2 × 2-tensors generated by symmetric rank one tensors over some small finite fields. Field Symmetric rank F2 F3 F5 F7 F11 F13 0 1 1 1 1 1 1 1 3 8 24 16 120 56 2 3 24 240 128 6600 1568 3 1 32 360 688 7920 16016 4 16 1232 10920 5 336 Total 8 81 625 2401 14641 28561 54 Chapter 4. Tensors over finite fields

over F2 for N = 2, 3, 4 have been calculated using the same program, see table 4.4. One might find some things in table 4.3 quite strange. For example, why are there fewer tensors with symmetric rank one over F7, F13 compared to F5 and F11 respectively? Over F2, F3, F5 and F11 there are the expected number of symmetric tensors with symmetric rank one, namely p2 − 1, one tensor for 2 every non-zero vector in Fp. This means that the tensors (αv) ⊗ (αv) ⊗ (αv) = 3 α v ⊗ v ⊗ v are different tensors for different α, but this is not the case in F7 3 and F13, where the mapping x 7→ x is not surjective. In the case of F7 the image is {0, 1, 6} and in the case of F13 the image is {0, 1, 5, 8, 12}. However, 3 for F2, F3, F5, F11, the mapping x 7→ x is bijective. One can also note that there are tensors that have a symmetric rank higher than the maximal rank (which is three). In the case of F7, we even have tensors which have a symmetric rank two steps higher than the maximal rank. Thus, conjecture 2.1.6 is false over at least some finite fields. For example, over F3 the tensor

1 0 0 2 2 0 0 0 1 2 2 1 = + 0 2 2 1 0 0 0 0 2 1 1 2 1 1 1 1 0 0 0 0 + + 1 1 1 1 0 0 0 1 has symmetric rank four, but rank three:

1 0 0 2 1 1 0 0 0 2 0 2 0 0 0 0 = + + 0 2 2 1 1 1 0 0 0 1 0 1 2 0 2 0 and over F7 the tensor 2 5 5 0 6 1 1 6 1 2 2 4 = + 5 0 0 4 1 6 6 1 2 4 4 1 1 2 2 4 1 0 0 0 0 0 0 0 + + + 2 4 4 1 0 0 0 0 0 0 0 1 has symmetric rank ﬁve, but rank three:

2 5 5 0 2 4 0 0 3 4 6 1 4 4 6 6 = + + 5 0 0 4 0 0 0 0 6 1 5 2 6 6 2 2 and one can note that the symmetric rank would have been four if expressions PR on the form T = r=1 λivi ⊗ vi ⊗ vi had been allowed. One can see from table 4.4 that not all symmetric tensors over F2 have a symmetric rank. Taking a tensor to be symmetric if it is invariant under every permutation τ ∈ S3, where τ acts as

τ(e1 ⊗ e2 ⊗ e3) = eτ(1) ⊗ eτ(2) ⊗ eτ(3) and extending linearly, the tensor

0 1 1 0 T = 1 0 0 0 4.4. Symmetric rank 55

Table 4.4: Number of N × N × N symmetric tensors generated by symmetric rank one tensors over F2. N Symmetric rank 2 3 4 0 1 1 1 1 3 7 15 2 3 21 105 3 1 35 455 4 35 1365 5 21 3003 6 7 5005 7 1 6435 Total 8 128 16384

is symmetric since it is T = e1 ⊗ e1 ⊗ e2 + e1 ⊗ e2 ⊗ e1 + e2 ⊗ e1 ⊗ e1 but is not in the linear span of the symmetric rank one elements:

1 0 0 0 0 0 0 0 1 1 1 1 , , . 0 0 0 0 0 0 0 1 1 1 1 1

Thus the tensor T is symmetric but does not have a symmetric rank. The next theorem summarizes the results so far in this section:

Theorem 4.4.1. Over F2 there are symmetric tensors which do not have a symmetric rank. Over some other finite fields there are symmetric tensors which have a symmetric rank larger than the maximal rank, in particular their symmetric rank is larger than their rank. One also sees that the sequences of numbers for N = 2, 3 are binomial 3 7 sequences, more precisely k and k . For N = 4 we have the sequence 15 15 15 0 , 1 ,..., 7 , i.e. half a binomial sequence. This can be explained with 0 2N −1 N N 2N −1 the fact that there are 1 = 2 = 0 zero vectors in F2 , 2 − 1 = 1 N non-zero vectors in F2 and each one of these corresponds to a symmetric rank one tensor. Picking two of these tensors and adding them results in a symmetric rank two tensor - the results show that these tensors are distinct, since the max- 2N −1 imum number of symmetric rank two tensors, 2 , is achieved. Analogous arguments hold for the higher ranks. In other words, symmetric tensors over F2 which have a symmetric rank have a unique symmetric decomposition with the number of terms equal to their rank for N = 2, 3, 4. 56 Chapter 4. Tensors over finite fields Chapter 5

Summary and future work

5.1 Summary

We have, in the first chapters, given the prerequisites for understanding results concerning tensors of order strictly larger than two, and some of the results themselves. The method in section 3.1 used the Jacobian of a map in a generic point to compute the lowest typical rank of tensors of arbitrary size. In section 3.2 we presented a method that, for a few tensor sizes, gave results which show that there are more than one typical rank. This was done by calculating solutions to a system of polynomial equations; if there were not enough real solutions then the tensor had a rank which was higher than assumed. The last numerical method, section 3.3, does not work as it is right now. The method is based on the idea that tensors of rank lower than the maximum typical rank will leave holes in the tensor space, and if one can find these holes it would give an indication that there are more than one typical rank. A nice property of finite fields is that they are finite, so in small cases it is practically possible to do an exhaustive search. From these exhaustive searches the pattern of the eight orbits of the 2 × 2 × 2 tensors appeared. That these are the orbits over any finite field was proved and their sizes were determined. It was also shown that one of the orbits over Fq has lower rank when the elements were considered as tensors over Fq2 . It was shown, by an exhaustive search, that there are symmetric tensors over F2 which does not have a symmetric rank and some symmetric tensors over some other finite fields have greater symmetric rank than rank, even that the symmetric rank can be greater than the maximal rank, showing that conjecture 2.1.6 does not hold true over some finite fields.

5.2 Future work

From the viewpoint of algebraic geometry, one would like to know the generic n1 n2 n3 rank in C ⊗C ⊗C , for all n1, n2, n3 . One would also like to know the secant defects. A lot of partial results have been published in this area. Furthermore, one would like to see in more detail what happens in the real case when one has more than one typical rank.

Erdtman, J¨onsson,2012. 57 58 Chapter 5. Summary and future work

Since it is theoretically possible to compute the smallest typical rank for any size of tensor with the method in section 3.1, what is needed is a method which enables one to discern if there is more than one typical rank. The method descibred in section 3.2 provides this for a few sizes of tensors, but a general method is unknown. A possible way to discern good upper bounds of typical ranks is to examine whether it is possible or not for a tensor space to have a typical rank that is higher than the lowest typical rank of a tensor space of larger format. Over ﬁnite ﬁelds there is much more to be done, how are the ranks distributed over larger tensors than 2 × 2 × 2? For which tensors is the hyperdeterminant non-zero? For 2×2×2 tensors, if the analogous orbits as over the reals have non- 4 4 3 zero hyperdeterminant (O2,B and O3,2), there will be (q − 1)(q − q ) tensors with non-zero hyperdeterminant, which can be compared to (q2 − 1)(q2 − q), the number of 2 × 2 matrices with non-zero determinant. Appendix A

Programs

In this appendix we present the programs used to execute the algorithms presented in the main text. All programs are written for Mathematica, except the surjectivity check program, which is written for MATLAB.

A.1 Numerical methods A.1.1 Comon, Ten Berge, Lathauwer and Castaing’s method Third order tensors The program below can be used to calculate the lowest typical rank of format n1 × n2 × n3 by calling ComputeLowestTypicalRank[n1, n2, n3]. lowbound = −1; highbound = 1; CreateJacobianBlock [dim1 , dim2 , dim3 ] := Block [ { x1 = RandomReal[ { lowbound , highbound } , dim1 ] , x2 = RandomReal[ { lowbound , highbound } , dim2 ] , x3 = RandomReal[ { lowbound , highbound } , dim3 ] } , Join [ KroneckerProduct[IdentityMatrix[dim1] , x2, x3] , KroneckerProduct[ { x1 } , IdentityMatrix[dim2], x3], KroneckerProduct[ { x1 } , x2, IdentityMatrix [dim3]] ] ]

ComputeLowestTypicalRank [ dim1 , dim2 , dim3 ] := Block [ { jacobian = CreateJacobianBlock[dim1, dim2, dim3], R = 0, D = 0} , While [D < MatrixRank[ jacobian ] , R++; D = MatrixRank[jacobian ]; jacobian = Join[jacobian , CreateJacobianBlock[dim1, dim2, dim3]] ]; R]

Erdtman, J¨onsson,2012. 59 60 Appendix A. Programs

Higher order tensors

This program can be used to compute the lowest typical rank of format n1 × n2 × · · · × nk by calling ComputeLowestTypicalRank[{n1, n2, ..., nk}]. lowbound = −1; highbound = 1; CreateJacobianBlock[ dimlist ] := Block [ { x = Table [RandomReal[ { lowbound , highbound } , dimlist[[i]]], { i, 1 , Length[dimlist] } ] } , Flatten [ Table [ Apply[KroneckerProduct , Join [ Table [ { x [ [ i ] ] } , { i , 1 , j − 1 } ], { IdentityMatrix[dimlist [[ j ]]] } , Table[x[[ i]] , { i, j + 1, Length[dimlist] } ] ] ], { j, 1, Length[dimlist] } ], 1 ] ]

ComputeLowestTypicalRank[ dimlist ] := Block [ { jacobian = CreateJacobianBlock[dimlist], R = 0, D = 0} , While [D < MatrixRank[ jacobian ] , R++; D = MatrixRank[jacobian ]; jacobian = Join[jacobian , CreateJacobianBlock[dimlist ]] ]; R]

A.1.2 Choulakian’s method

The program belows runs Choulakians method for k random n1 × n2 × n3 by calling TestTesorZeros[n1, n2, n3, k]. If k is not speciﬁed, it is taken to be 1000. RandomTensor[ n1 , n2 , n3 ] := RandomInteger[{ −100 , 100} , {n1 , n2 , n3 } ]

TestTensorZeros[n1 , n2 , n3 , t r i e s : 1000] := With [ { s l i s t = Append[Table[Symbol[”s” <> ToString[i ]] , { i , 1 , n3 − 1 } ] , 1 ] , c l i s t = Prepend[Table[Symbol[”c” <> ToString[i ]] , { i , 2 , n1 } ] , 1 ] } , Sort[Tally[ ParallelTable[ X = RandomTensor[n1, n2, n3]; A.1. Numerical methods 61

Xpolys = Apply[Union , Table[(X[[ i ]] − clist[[i]] X[[1]]).slist , { i , 2 , n1 } ]]; Xgrob = GroebnerBasis[Xpolys , Join[slist[[1 ;; −2]], Reverse[clist[[2 ;; − 1 ] ] ] ] ]; Length[Solve[Xgrob[[1]] == 0, c2, Reals]] , { i , 1 , t r i e s } ] ]] ]

A.1.3 Surjectivity check The calculatins of the surjectivity check was mostly done with a sript for Matlab and it works with tensor spaces size of sizes 2 × 2 × 2, 2 × 2 × 3, 2 × 3 × 3 and 3 × 3 × 4. The script generates data for closest distances to the control points to the test points and calculate how many there are in certain distances. The distances are derived from the amount of area around the control point (see table 3.12). This is summarized in the matrices summaryIJK, where I × J × K is the format of the tensor space. For the 2 × 2 × 2 tensor space the hyperdeterminant is calculated by the matlab function hyperdeterminant that takes vectorized 2 × 2 × 2 tensors as input and for the 2 × 3 × 3 tensor space the rank is calculated by a method used in [36]. To calculate the area given a distance we used the matlab function quotaCap.

Support functions

function [ quotaA ] = quotaCap( dim,r, dist ) quotaA = 1/2∗ betainc((dist.ˆ2/(rˆ2)− d i s t .ˆ4/( 4∗ r ˆ 4 ) ) , . . . ( dim −1)./2,1/2); end

function [ value ] = hyperdeterminant(T) value = ((T(1,:). ∗T(8,:)).ˆ2 + (T(5,:). ∗T(4,:)).ˆ2 + ... (T( 3 , : ) . ∗T(6,:)).ˆ2 + (T(7,:). ∗T(2 ,:)).ˆ2) − ... 2∗(T( 1 , : ) . ∗T( 4 , : ) . ∗T( 5 , : ) . ∗T( 8 , : ) + . . . T( 1 , : ) . ∗T( 3 , : ) . ∗T( 6 , : ) . ∗T( 8 , : ) + . . . T( 1 , : ) . ∗T( 2 , : ) . ∗T( 7 , : ) . ∗T( 8 , : ) + . . . T( 3 , : ) . ∗T( 5 , : ) . ∗T( 6 , : ) . ∗T( 4 , : ) + . . . T( 2 , : ) . ∗T( 4 , : ) . ∗T( 5 , : ) . ∗T( 7 , : ) + . . . T( 2 , : ) . ∗T( 3 , : ) . ∗T( 6 , : ) . ∗T(7,:))+ ... 4∗(T( 1 , : ) . ∗T( 4 , : ) . ∗T( 6 , : ) . ∗T( 7 , : ) + . . . T( 2 , : ) . ∗T( 3 , : ) . ∗T( 5 , : ) . ∗T( 8 , : ) ) ; end Main script 62 Appendix A. Programs t0 = c l o c k ; nrControl = 2000; nrTestPoint = 2000; disp(’Start 2x2x2!’) dArea222 = [0.6355 0.5 0.4511 0.3565 0.3225 0.2555]; sizeArea = length(dArea222); NrCloseToControl222 = zeros(sizeArea ,nrControl); shortestDist222 = ones(1,nrControl) ∗ 1 0 0 ; Control222 = randn(2∗2∗2,nrControl); Control222 = Control222./(ones(2 ∗ 2 ∗ 2 , 1 ) ∗ ... sqrt(sum(Control222. ∗ Control222 ))); hypeControl = hyperdeterminant(Control222); avgDist222 = zeros(1,nrControl); r =2; for i=1:nrTestPoint testPoint = makeTensor(2,2,2,2); testPoint(:) = testPoint(:)/norm(testPoint (:)); dist = sqrt(sum((Control222−testPoint (:) ∗ ... ones(1,nrControl)). ∗ ( Control222−testPoint (:) ∗ ... ones(1,nrControl)) ,1)); for d=1:sizeArea NrCloseToControl222(d,:) = NrCloseToControl222(d,:) +... ( d i s t < dArea222(d)); end shortestDist222(dist < shortestDist222) =... d i s t ( d i s t < shortestDist222 ); avgDist222 = avgDist222+dist/nrControl; end summary222 = [NrCloseToControl222 ;... round (1000∗ hypeControl );... round (1000∗ shortestDist222 )]; disp(’Done! 2x2x2’) disp(etime(clock ,t0))

%% disp(’Start 2x2x3!’) dArea223 = [0.631 0.54 0.5051 0.4337 0.4065 0.3498]; sizeArea = length(dArea223); NrCloseToControl223 = zeros(sizeArea ,nrControl); shortestDist223 = ones(1,nrControl) ∗ 1 0 0 ; Control223 = randn(2∗2∗3,nrControl); Control223 = Control223./(ones(2 ∗ 2 ∗ 3 , 1 ) ∗ ... sqrt(sum(Control223. ∗ Control223 ))); avgDist223 = zeros(1,nrControl); for i=1:nrTestPoint testPoint = makeTensor(2,2,3,3); testPoint(:) = testPoint(:)/norm(testPoint (:)); dist = sqrt(sum((Control223−testPoint(:) ∗ ones(1,nrControl )). ∗ ... A.1. Numerical methods 63

(Control223−testPoint(:) ∗ ones(1,nrControl)) ,1)); for d=1:sizeArea NrCloseToControl223(d,:) = NrCloseToControl223(d,:) +... ( d i s t < dArea223(d)); end shortestDist223(dist < shortestDist223) =... d i s t ( d i s t < shortestDist223 ); avgDist223 = avgDist223+dist/nrControl; end summary223 = [NrCloseToControl223 ;... round (1000∗ shortestDist223 )]; disp(’Done! 2x2x3’)

disp(’Start 2x3x3!’) dArea233 = [0.7835 0.7035 0.672 0.606 0.5796 0.5239]; sizeArea = length(dArea233); NrCloseToControl233 = zeros(sizeArea ,nrControl); shortestDist233 = ones(1,nrControl) ∗ 1 0 0 ; Control233 = randn(2∗3∗3,nrControl); Control233 = Control233./(ones(2 ∗ 3 ∗ 3 , 1 ) ∗ ... sqrt(sum(Control233. ∗ Control233 ))); avgDist233 = zeros(1,nrControl); for i=1:nrTestPoint testPoint = makeTensor(2,3,3,3); testPoint(:) = testPoint(:)/norm(testPoint (:)); dist = sqrt(sum((Control233−testPoint(:) ∗ ones(1,nrControl )). ∗ ... (Control233−testPoint(:) ∗ ones(1,nrControl)) ,1)); for d=1:sizeArea NrCloseToControl233(d,:) = NrCloseToControl233(d,:) +... ( d i s t < dArea233(d)); end shortestDist233(dist < shortestDist233) =... d i s t ( d i s t < shortestDist233 ); avgDist233 = avgDist233+dist/nrControl; end r = rank332eig(Control233); summary233 = [NrCloseToControl233 ;... round (1000∗ shortestDist233); r]; disp(’Done! 2x3x3’)

disp(’Start 3x3x4!’) t1 = c l o c k ; dArea334 = [1 0.981 0.9225 0.8991 0.8486 0.8282 0.7837 ]; sizeArea = length(dArea334); Control334 = randn(4∗3∗3,nrControl); Control334 = Control334./(ones(4 ∗ 3 ∗ 3 , 1 ) ∗ ... 64 Appendix A. Programs

sqrt(sum(Control334. ∗ Control334 ))); NrCloseToControl334 = zeros(sizeArea ,nrControl); shortestDist334 = ones(1,nrControl) ∗ 1 0 0 ; avgDist334 = zeros(1,nrControl); for i=1:nrTestPoint testPoint = makeTensor(3,3,4,5); testPoint(:) = testPoint(:)/norm(testPoint (:)); dist = sqrt(sum((Control334−testPoint(:) ∗ ones(1,nrControl )). ∗ ... (Control334−testPoint(:) ∗ ones(1,nrControl)) ,1)); for d=1:sizeArea NrCloseToControl334(d,:) = NrCloseToControl334(d,:) +... ( d i s t < dArea334(d)); end shortestDist334(dist < shortestDist334) =... d i s t ( d i s t < shortestDist334 ); avgDist334 = avgDist334+dist/nrControl; end summary334 = [NrCloseToControl334 ;... round (1000∗ shortestDist334 )]; disp(’Done 3x3x4! ’) disp(etime(clock ,t1)) A.2 Tensors over ﬁnite ﬁelds

A.2.1 Rank partitioning Tensor rank

The following program gives an output {R0,R1,...,Rk} which is an parition of the n1×n2×n3 tensors up to rank k over Fp, for a prime number p, where Ri contains all the tensors of rank i, given the input GenerateTensorHierarchy[n1, n2, n3, p, k]. If k is not speciﬁed, the program runs until all the tensors have been generated. If p is not speciﬁed, it is taken to be 2. GenerateRankOneTensors [dim1 , dim2 , dim3 , mod : 2 ] := List [Union[Flatten[Table[ Flatten[TensorProduct[IntegerDigits[i , mod, dim1], IntegerDigits[j, mod, dim2], IntegerDigits[k, mod, dim3], mod ] ] , { i, 1, modˆdim1 − 1} , { j, 1, modˆdim2 − 1} , {k, 1, modˆdim3 − 1 } ], 2 ] ] ]

GenerateRankTwoTensors[ rankonelist , mod : 2 ] := Append[ rankonelist , Complement [ Union[Flatten[ ParallelTable[ Mod[rankonelist[[1, i]] + rankonelist[[1, j]], mod], { i , 1 , A.2. Tensors over ﬁnite ﬁelds 65

Length[rankonelist [[1]]] } , { j, i, Length[rankonelist[[1]]] } ], 1 ] ] , rankonelist [[1]] , { Table [ 0 , { i, 1, Length[rankonelist[[1, 1]]] } ] } ] ]

GenerateNextRank[ tensorlist , mod : 2 ] := If [Length[ tensorlist ] == 1, GenerateRankTwoTensors[tensorlist , mod], Append[ tensorlist , Complement [ Union[Flatten[ ParallelTable[ Mod[ tensorlist[[ −1, i]] + tensorlist[[1, j]], mod], { i , 1 , Length[ tensorlist [[ − 1 ] ] ] } , { j, 1, Length[tensorlist[[1]]] } ], 1 ] ] , Flatten[tensorlist , 1], { Table [ 0 , { i, 1, Length[tensorlist[[1, 1]]] } ] } ] ] ]

GenerateTensorHierarchy [dim1 , dim2 , dim3 , mod : 2 , maxrank : Infinity]:= Module [ { tensorlist = GenerateRankTwoTensors [ GenerateRankOneTensors[dim1, dim2, dim3, mod], mod], r = 1} , While[++r < maxrank && Count[ tensorlist , L i s t , {2}] < modˆ(dim1 dim2 dim3) − 1 , tensorlist = GenerateNextRank[tensorlist , mod]]; t e n s o r l i s t ]

Symmetric rank The next program is the symmetric version of the program above. Note that it will only compute the symmetric tensors which lie in the linear span of the symmetric rank one-tensors. GenerateSymRankOneTensors [ dim , mod : 2 ] := List [Union[Table[ Flatten [TensorProduct[ IntegerDigits[i , mod, dim], IntegerDigits[i , mod, dim], IntegerDigits[i , mod, dim], mod ] ] , { i, 1, modˆdim − 1 } ]]]

GenerateSymRankTwoTensors[ rankonelist , mod : 2 ] := Append[ rankonelist , Complement [ Union[Flatten[ 66 Appendix A. Programs

ParallelTable[ Mod[rankonelist[[1, i]] + rankonelist[[1, j]], mod], { i , 1 , Length[rankonelist [[1]]] } , { j, i, Length[rankonelist[[1]]] } ], 1 ] ] , rankonelist [[1]] , { Table [ 0 , { i, 1, Length[rankonelist[[1, 1]]] } ] } ] ]

GenerateNextSymRank[ tensorlist , mod : 2 ] := If[Length[tensorlist] == 1, GenerateRankTwoTensors[ tensorlist , mod] , Append[ tensorlist , Complement [ Union[Flatten[ ParallelTable[ Mod[ tensorlist[[ −1, i]] + tensorlist[[1, j]], mod], { i , 1 , Length[ tensorlist [[ − 1 ] ] ] } , { j, 1, Length[tensorlist[[1]]] } ], 1 ] ] , Flatten[tensorlist , 1 ] , { Table [ 0 , { i, 1, Length[tensorlist[[1, 1]]] } ] } ] ] ]

GenerateSymTensorHierarchy [dim , mod : 2 , maxrank : Infinity] := Module [ { tensorlist = GenerateSymRankTwoTensors[GenerateSymRankOneTensors[dim, mod] , mod ] , r = 1} , While[++r < maxrank && tensorlist [[ − 1 ] ] != {} , tensorlist = GenerateNextSymRank[ tensorlist , mod]]; tensorlist[[1 ;; −2]] ]

A.2.2 Orbit paritioning

Given a rank partition {R0,R1,...,Rk} of 2 × 2 × 2 tensors from the program above, the next program generates the orbits under GLp(2)×GLp(2)×GLp(2), given the input AllOrbits[{R0, R1, ..., Rk}, G, p], where G is the group for which one wants the orbits. ApplyMatrixElement [ mtuple , ten so r , l 1 , l 2 , l 3 ] := Sum[tensor[[j1, j2, j3]] mtuple[[1, j1, l1]] mtuple[[2, j2, l2]] mtuple[[3, j3, l3]], { j1 , 1 , 2} , { j2 , 1 , 2} , { j3 , 1 , 2 } ]

ApplyMatrix[ mtuple , te nso r , mod : 2 ] := Mod[ Table [ ApplyMatrixElement[mtuple, tensor , l1 , l2 , l3], { l1 , 1 , 2} , { l2 , 1 , 2} , { l3 , 1 , 2 } ] , mod ]

GetOrbit[ tensor , matrices , mod : 2 ] := A.2. Tensors over ﬁnite ﬁelds 67

Union[Map[ApplyMatrix[#, tensor , mod] &, matrices ]]

AllOrbits[tensorhierarchy , matrices , mod : 2 ] := Module [ { temphierarchy = tensorhierarchy , orbits = {}} , For [ i = 1 , i <= Length[tensorhierarchy], i++, orbits = Append[orbits , {} ]; While[temphierarchy[[ i ]] != {} , o r b i t s [ [ − 1 ] ] = Append[ orbits [[ − 1 ] ] , GetOrbit[temphierarchy[[i , 1]], matrices , mod]]; temphierarchy[[ i ]] = Complement[temphierarchy [[ i ]] , orbits[[ −1 , −1]]] ] ]; o r b i t s ]

The next program is a simple program for generating GLp(n). GenerateGL[ n , p : 2 ] := DeleteCases[ Union[Tuples[Table[IntegerDigits[i , p, n], { i , 1 , pˆn − 1 } ], 2 ] ] , ?(Mod[Det[#], p] == 0 &), {1}]

One can write Tuples[GenerateGL[n, p], 3] to get GLp(n) × GLp(n) × GLp(n). 68 Appendix A. Programs Bibliography

[1] Evrim Acar, Bulent Yener (2009), Unsupervised multiway data analysis: a literature survey, IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 1 pp. 6-20.

[2] C.J. Appellof, E.R. Davidson, (1981), Strategies for analyzing data from video ﬂuorometric monitoring of liquid chromatographic eﬄuents, Analyt- ical Chemistry 53, pp. 2053-2056.

[3] E. Ballico, A. Bernardi (2011), Tensor ranks on tangent developable of Segre varieties, inria-00610362, version 2.

[4] G¨oranBergqvist (to appear), Exact probabilities for typical ranks of 2×2×2 and 3 × 3 × 2 tensors, Linear Algebra and its Applications.

[5] G¨oranBergqvist, P. J. Forrester (2011) Rank probabilities for real random N × N × 2 tensors, Electronic Communications in Probability 16, pp. 630- 637.

[6] Jack Bochniak, Michel Coste, Marie-Fran¸coiseRoy (1998), Real Algebraic Geometry, Springer Verlag, A Series of Modern Surveys in Mathematics, ISBN 3-540-64663-9.

[7] Maria Chiara Brambilla, Giorgio Ottaviani (2008), On the Alexan- der–Hirschowitz theorem, Journal of Pure and Applied Algebra 212(5), pp. 1229–1251.

[8] Bruno Buchberger (2001), Groebner Bases: A Short Introduction for Sys- tem Theorists, Springer Verlag, Proceedings of EUROCAST 2001, Lecture Notes in Computer Science 2178, pp. 1-19.

[9] Peter B¨urgisser,Michael Clausen, M. Amin Shokrollah (1997), Algebraic Complexity Theory, Springer Verlag, Grundlehren der matematischen Wis- senschaften 315, ISBN 3-540-60582-7.

[10] Pierre Comon, Gene Golub, Lek-Heng. Lim, Bernard Mourrain (2008), Symmetric tensors and symmetric tensor rank, SIAM Journal on Matrix Analysis and Applications 30(3), pp 1254-1279.

[11] P. Comon, J.M.F. ten Berge, L. De Lathauwer, J. Castaing (2009), Generic and typical ranks of multi-way arrays, Linear Algebra and its Applications 430, pp. 2997-3007.

Erdtman, J¨onsson,2012. 69 70 Bibliography

[12] Vartan Choulakian (2010), Some numerical results on the rank of generic three-way arrays over R, SIAM Journal of Matrix Analysis and its Appli- cations 31(4), pp. 1541-1551.

[13] David A. Cox, John Little, Donal O’Shea (2005), Using Algebraic Geom- etry, Springer Verlag, Graduate Texts in Mathematics 185, ISBN 0-387- 20733-3.

[14] Vin de Silva, Lek-Heng Lim (2008), Tensor rank and the ill-posedness of the best low-rank approximation problem, SIAM Journal of Matrix Analysis and its Applications 30, pp. 1084-1127.

[15] Shmuel Friedland (2011), On the generic and typical ranks of 3-tensors, arXiv:0805.3777v5[math.AG].

[16] I.M Gelfand, M.M. Kapranov, A.V. Zelevinsky Hyperdeterminants, Ad- vances in mathematics 96, pp. 226-263.

[17] Joe Harris (1992), Algebraic Geometry: A First Course, Springer Verlag, Graduate Texts in Mathematics 133, ISBN 0-387-97716-3.

[18] Johan H˚astad(1990), Tensor rank is NP-complete, Journal of Algorithms 11, pp. 644-654.

[19] F.L. Hitchcock (1927) The expression of a tensor of a polyadic as a sum of products Journal of Mathematical Physics 6, pp. 164-189.

[20] F.L. Hithcock (1927) Multiple invariants and generalized rank of a p-way matrix or tensor, Journal of Mathematical Physics 7, pp. 39-79.

[21] Donald Knuth (1998), The Art of Computer Programming, Volume 2: Seminumerical Algorithms, Third Edition, Addison-Wesley, ISBN 978-2- 201-89684-8.

[22] Joseph B. Kruskal (1977), Three-way arrays: rank and uniqueness of trilin- ear decompositions, with application to arithmetic complexity and statistics, Linear Algebra and its Applications 18(2), pp. 95-138.

[23] J. B. Kruskal (1989), Rank, decomposition and uniqueness for 3-way arrays and N-way arrays, Multiway Data Analysis, Elseiver, pp. 7-18.

[24] Tamara G. Kolda, Brett W. Bader (2009), Tensor Decompositions and Applications, SIAM Review vol. 51(3), pp. 455-500.

[25] J.M. Landsberg (2012), Tensors: Geometry and Applications, American Mathematical Society, Graduate Studies in Mathematics 128, ISBN 978-0- 8218-6907.

[26] J.M. Landsberg (2006), The border rank of the multiplication of 2 × 2 matrices is seven, Journal of the American Mathematical Society 19(2), pp. 447-512.

[27] Sharon J. Laskowski (1982), Computing lower bounds on tensor rank over ﬁnite ﬁelds, Journal of Computer and System Sciences 24(1), pp. 1-14. Bibliography 71

[28] S. Li (2011), Concise forumlas for the area and volume of a hypersperical cap, Asian Journal of Mathematics Statistics 4, pp. 66-70.

[29] Thomas Lickteig (1985), Typical Tensor Rank, Linear Algebra and its Ap- plications 69, pp. 95-120. [30] Rudolf Lidl, Harald Niederreiter (1997), Finite ﬁelds, Cambridge University Press, Encyclopedia of Mathematics and its Applications 20, ISBN 0-521- 39231-4.

[31] David Mumford (1976), Algebraic Geometry I: Complex Projective Vari- eties, Springer Verlag, Grundlehren der mathematischen Wissenchaften 221, ISBN 0-387-07603-4. [32] Erich Novak, Klaus Ritter (1997) The curse of dimension and a universal method for numerical integration Multivariate Approximation and Splines, eds. G. N¨urnberger, J.W. Schmidt and G. Walz, pp 177-188 [33] Luke Oeding (2008), Report on Geometry and representation theory of tensor for computer science, statistics and other areas, arXiv:0810.3940[math.AG].

[34] Ron M. Roth (1996), Tensor Codes for the Rank Metric, IEEE Transactions on Information Theory 42(6), pp. 2146-2157. [35] Nicholas D. Sidiropoulos, Rasmus Bro (2000), On the uniqueness of multilinear decomposition of n-way arrays, J. Chemometrics 14(3), pp. 229-239. [36] Alwin Stegeman (2006), Degeneracy in CANDECOMP/PARAFAC explained for p × p × 2 arrays of rank p + 1 or higher, Psychometrika 71(3), pp. 483-501. [37] Volker Strassen (1969), Gaussian Elimination is not optimal, Numerische Mathematik 13(4), pp. 354-356.

[38] Jos M.F. ten Berge (2000), The typical rank of tall three-way arrays, Psy- chometrika 65(4), pp. 525-532. [39] Jos M.F. ten Berge (2004), Partial uniqueness in Candecomp/Parafac, Journal of Chemometrics 18, pp. 12-16. [40] Jos M.F. ten Berge, Alwin Stegeman (2006), Symmetry transformations for square sliced three-way arrays, with applications to their typical rank, Linear Algebra and its Applications 418, pp. 215-224. [41] L. R. Tucker (1963), Implications of factor analysis of three-way matrices for measurement of change, in Problems of Measuring Change, C.W. Harris (ed.), University of Wisconsin Press, pp. 122-137.

[42] Sergei Wintzki (2010), Linear Algebra via Exterior Products, available as e-book from http://sites.google.com/site/winitzki/linalg. 72 Bibliography Copyright The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years from the date of publication barring exceptional circumstances. The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for your own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative mea- sures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For ad- ditional information about the Link¨opingUniversity Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its WWW home page: http://www.ep.liu.se/

Upphovsrätt Detta dokument h˚allstillgängligtp˚aInternet - eller dess framtida ersättare- under 25 ˚arfr˚anpubliceringsdatum under förutsättningatt inga extraordinära omständigheteruppst˚ar.Tillg˚angtill dokumentet innebärtillst˚andförvar och en att läsa,ladda ner, skriva ut enstaka kopior förenskilt bruk och att använda det oförändratförickekommersiell forskning och förundervisning. Overföring¨ av upphovsrättenvid en senare tidpunkt kan inte upphäva detta tillst˚and.All annan användningav dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,säkerheten och tillgänglighetenfinns det lösningarav teknisk och administrativ art. Upphovsmannens ideella rättinnefattar rätt att bli nämndsom upphovsman i den omfattning som god sed kräver vid användningav dokumentet p˚aovan beskrivna sättsamt skydd mot att dokumentet ändraseller presenteras i s˚adanform eller i s˚adant sammanhang som är kränkande förupphovsmannens litteräraeller konstnärligaanseende eller ege- nart. Förytterligare information om LinköpingUniversity Electronic Press se förlagetshemsida http://www.ep.liu.se/ c 2012, Elias Erdtman, Carl Jönsson

Erdtman, J¨onsson,2012. 73