Tensors: a Brief Introduction Pierre Comon
Total Page:16
File Type:pdf, Size:1020Kb
Tensors: a Brief Introduction Pierre Comon To cite this version: Pierre Comon. Tensors: a Brief Introduction. IEEE Signal Processing Magazine, Institute of Electrical and Electronics Engineers, 2014, 31 (3), pp.44-53. 10.1109/MSP.2014.2298533. hal-00923279v4 HAL Id: hal-00923279 https://hal.archives-ouvertes.fr/hal-00923279v4 Submitted on 31 Jul 2014 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. 1 Tensors: a Brief Introduction Pierre Comon⋆, Fellow, IEEE Abstract—Tensor decompositions are at the core of many The present survey aims at motivating the Signal Processing Blind Source Separation (BSS) algorithms, either explicitly or readership in diving in the promising world of tensors. implicitly. In particular, the Canonical Polyadic (CP) tensor decomposition plays a central role in identification of under- determined mixtures. Despite some similarities, CP and Singular II. THE WORLD OF TENSORS Value Decomposition (SVD) are quite different. More generally, Tensors have been introduced at the end of the nineteenth tensors and matrices enjoy different properties, as pointed out in this brief introduction. century with the development of the differential calculus. They have then been omnipresent in physics, to express laws independently of coordinate systems. Yet, a tensor is I. MOTIVATION essentially a mapping from a linear space to another, whose coordinates transform multilinearly under a change of bases, Riginally, Blind Source Separation (BSS) exploited as subsequently detailed. For an easier reading, we shall resort O mutual statistical independence between sources [20]. to arrays of coordinates, when this indeed eases presentation; Among possible approaches based on the sole hypothesis interested readers may want to refer to [23], [49] for a more of source statistical independence, several use cumulants. In advanced coordinate-free presentation. fact, when random variables are independent, their cumulant tensor is diagonal [57]. When the source mixture is linear, the decomposition of the data cumulant tensor into a sum of outer products yields the columns of the mixing matrix. This is the first instance of tensor decomposition applied to BSS, even if it is not always explicit. In that case, the tensor is actually symmetric. In the presence of noise, the extraction of sources themselves needs another procedure, based for instance on a spatial matched filter (SMF) [20]. BSS has then been addressed later in different manners. E. Waring J. J. Sylvester A. Clebsch A quite interesting class of approaches consists of exploiting (1736-1798) (1814 - 1897) (1833-1872) an additional diversity [74]. More precisely, measurements are usually made in two dimensions, generally space and time. But A. Linearity if they are made as a function of three (or more) dimensions, Linearity expresses the property of a map µ defined on a ′ e.g. frequency, polarization, time repetition, etc, the data are vector space S onto another vector space S built on the same stored in a multi-way array. By treating this array as a matrix, field1 K that: µ(αx+βy)= αµ(x)+βµ(y), x, y S,α,β ′ ∀ ∈ ∈ information is lost. Yet, in some real-world applications, it K. If S and S are of finite dimension, then this map can be is meaningful to assume a multi-linear model for this multi- represented by a matrix of coordinates, once the bases of S ′ way array, which justifies to consider it as a tensor. The and S have been fixed. We see that every linear map can decomposition of the latter into a sum of outer products yields be associated with a matrix, say A, so that µ(x) = Ax. On not only the columns of the mixture, but also an estimate of the the other hand, every matrix does not uniquely define a map. sources. So contrary to the first generation of BSS algorithms, In fact, a matrix A could for instance define a bilinear form ′ T there is no need to resort to an extracting filter. In addition, from S S onto K, i.e f(x , x ) = x Ax . Hence, the × 1 2 1 2 no statistics are to be estimated, so that the performance is correspondence between maps and arrays of coordinates is expected to be better for short samples or correlated sources. not one-to-one. Beside numerous books dedicated to applications in physics, there already exist some surveys that can be used in Signal B. Bilinearity Processing. To start with, some background is presented in Let’s start with a simple example. [46], i.e. basic engineering tools and a good panel of applica- Example 1: Consider two multi-dimensional zero- tions; a more signal processing oriented tensor overview may mean random variables z1 and z2, and denote the be found in [16]. A quite complete digest, more theoretical T cross-covariance matrix by G = E z1 z . We see and oriented towards algebraic geometry, can be found in [49]. { 2 } that the covariance is linear with respect to z1 and z , which is referred to as bilinearity. Now suppose ⋆ Corresponding Author: P. Comon, GIPSA-Lab, CNRS UMR5216, 2 Grenoble Campus, BP.46, F-38402 St Martin d’Heres` cedex, France. that z1 and z2 represent two phenomena that are pierre.comon @ grenoble-inp.fr, tel: +33-4 7682 6271, fax: +33-4 7657 4790. This work has been funded by the ERC Grant, Decoda project, agreement 1As far as we are concerned, K will be either the field of real numbers R no. 320594. or complex numbers C. 2 measured in a given coordinate system. G gives Example 2: Let x S , x S and x S . 1 ∈ 1 2 ∈ 2 3 ∈ 3 an indication of their correlation. If we change the Tensors 6x1 ⊗ x2 ⊗ x3 and x1 ⊗ 2x2 ⊗ 3x3 are the coordinate system, the covariance matrix changes. same, but in S1 S2 S3, vectors (6x1, x2, x3) ′ ′ × × More precisely, if z1 = Az1 and z2 = Bz2, then and (x1, 2x2, 3x3) are different. ′ E ′ ′ T ′ T G = z1 z2 can be written G = AGB . { ′} If a linear change of basis is made in space S1 (resp. S2 and We see that G = G whereas the phenomena ′ ′ ′ 6 S3), as x = Ax (resp. y = By and z = Cz), then the array remain the same. So we must distinguish between ′ T defining multilinear form f in the new coordinate system the physical phenomena that are coordinate-free, expresses as a function of T . For so-called contravariant and the arrays of measurements we made. And tensors, the relationship is because of bilinearity, we know how to go from one matrix representation to another. We may say that ′ Tijk = AipBjqCkr Tpqr (1) the covariance object is a tensor of order 2, and can pqr X be represented by a matrix in any given coordinate ′ system. as in Example 1, or in compact form: T =(A, B, C) T . On the other hand, there also exist covariant tensors for which· the What we just saw in Example 1 can be put in more formal inverses of the above matrices are instead involved (cf. Exam- terms. Now assume a linear change of coordinates is made in ple 4), and even mixed tensors that are partly covariant and S S′ spaces and defined by matrices A, B so that the new partly contravariant [71], [23]. However, we shall concentrate ′ { ′ } coordinates express as x1 = Ax1 and x2 = Bx2. A tensor only on contravariant tensors in this paper, which follow (1) represented in the original basis with an array G will be under a multilinear transformation. Note that (1) holds true for representedG (as in Example 1) in the new basis by the new ′ contravariant tensors even if the linear transforms (A, B, C) array G whose coordinates are: are not invertible; they can even be rectangular matrices. This ′ property is crucial in BSS when mixtures are underdetermined Gij = AipBjqGpq p,q [20], [83]. X ′ This can be compactly denoted by2: G = (A, B) G. This Example 3: Consider three multi-dimensional ran- will now be extended to orders higher than 2. · dom variables x, y and z. Then the 3rd order moment tensor is represented by the 3rd order M C. Multilinearity array Mijk = E xiyjzk . As in the case of 2nd { } Now assume Sd are D vector spaces, 1 d D, and order moments, it is a contravariant tensor. In fact, ′ ′ ′ suppose is a map from S S onto≤ K≤. Map if x = Ax, y = By and z = Cz, then f 1 D f ′ is said to be multilinear if f×···×(x , ..., x ) is linear with M = (A, B, C) M as in (1). It turns out that 1 D · respect to every variable xd, 1 d D. In other words, cumulants may also be seen as tensors as pointed ≤ ≤ f(x1, ...,αxd + βyd,..., xD)= αf(x1, ..., xd,..., xD)+ out in [57]. Because cross-cumulants of independent βf(x1, ..., yd,..., xD), d, α,β K. This map is actually random variables are null at any order, they have a multilinear form. As in∀ the previous∀ ∈ section, map f can be been extensively used in BSS. For instance, the represented by an array of coordinates, once the bases of Sd cumulant tensor of order 2 is nothing else but the have been fixed, 1 d D, and this array needs D indices.