<<

SINGULAR VALUES AND EIGENVALUES OF TENSORS: A VARIATIONAL APPROACH

Lek-Heng Lim

Stanford University Institute for Computational and Mathematical Engineering Gates Building 2B, Room 286, Stanford, CA 94305

ABSTRACT order-2 tensor), the constrained variational approach generalizes We propose a of eigenvalues, eigenvectors, singular values, in a straight-forward manner — one simply replaces the bilinear and singular vectors for tensors based on a constrained variational functional x|Ay (resp. x|Ax) by the multilinear approach much like the for symmetric functional (resp. homogeneous polynomial) associated with a ten- eigenvalues. These notions are particularly useful in generalizing sor (resp. symmetric tensor) of order k. The constrained critical certain areas where the spectral theory of matrices has tradition- values/points then yield a notion of singular values/vectors (resp. ally played an important role. For illustration, we will discuss a eigenvalues/vectors) for order-k tensors. multilinear generalization of the Perron-Frobenius theorem. An important point of distinction between the order-2 and order- k cases is in the choice of norm for the constraints. At first glance, it may appear that we should retain the l2-norm. However, the crit- 1. INTRODUCTION icality conditions so obtained are no longer scale invariant (ie. the It is well known that the eigenvalues and eigenvectors of a sym- property that xc in (1) or (uc, vc) in (2) may be replaced by αxc metric matrix A are the critical values and critical points of its or (αuc, αvc) without affecting the validity of the equations). To | 2 preserve the scale invariance of eigenvectors and singular vectors Rayleigh quotient, x Ax/kxk2, or equivalently, the critical val- 2 | for tensors of order k ≥ 3, the l -norm must be replaced by the ues and points of the quadratic form x Ax constrained to vectors k 2 n l -norm (where k is the order of the tensor), with unit l -norm, {x | kxk2 = 1}. If L : R × R → R is the associated Lagrangian with Lagrange multiplier λ, k k 1/k kxkk = (|x1| + ··· + |xn| ) . | 2 L(x, λ) = x Ax − λ(kxk2 − 1), n then the vanishing of ∇L at a critical point (xc, λc) ∈ × The consideration of eigenvalues and singular values with respect R R p yields the familiar defining condition for eigenpairs to l -norms where p 6= 2 is prompted by recent works [1, 2] of Choulakian, who studied such notions for matrices. Axc = λcxc. (1) Nevertheless, we shall not insist on having scale invariance. Instead, we will define eigenpairs and singular pairs of tensors Note that this approach does not work if A is nonsymmetric — the p critical points of L would in general be different from the solutions with respect to any l -norm (p > 1) as they can be interesting even of (1). when p 6= k. For example, when p = 2, our defining equations for A little less widely known is an analogous variational approach singular values/vectors (6) become the equations obtained in the m×n best rank-1 approximations of tensors studied by Comon [3] and to the singular values and singular vectors of a matrix A ∈ R , | de Lathauwer et. al. [4]. For the special case of symmetric tensors, with x Ay/kxk2kyk2 assuming the role of the Rayleigh quotient. m n our equations for eigenvalues/vectors for p = 2 and p = k define The associated Lagrangian function L : R × R × R → R is now respectively, the Z-eigenvalues/vectors and H-eigenvalues/vectors | in the soon-to-appear paper [5] of Qi. For simplicity, we will re- L(x, y, σ) = x Ay − σ(kxk2kyk2 − 1). strict our study to integer-valued p in this paper. L is continuously differentiable for non-zero x, y. The first order We thank Gunnar Carlsson, Pierre Comon, Lieven de Lath- condition yields auwer, Vin de Silva, and Gene Golub for helpful discussions. We | Ayc/kyck2 = σcxc/kxck2,A xc/kxck2 = σcyc/kyck2, would also like to thank Liqun Qi for sending us an advanced copy of his very relevant preprint. m n at a critical point (xc, yc, σc) ∈ R × R × R. Writing uc = xc/kxck2 and vc = yc/kyck2, we get the familiar

| 2. TENSORS AND MULTILINEAR FUNCTIONALS Avc = σcuc,A uc = σcvc. (2)

Although it is not immediately clear how the usual definitions A k-array of real numbers representing an order-k tensor will be of eigenvalues and singular values via (1) and (2) may be gen- d1×···×dk denoted by A = aj1···jk ∈ R . Just as an order-2 eralized to tensors of order k ≥ 3 (a matrix is regarded as an tensor (ie. matrix)J may beK multiplied on the left and right by a pair of matrices (of consistent dimensions), an order-k tensor may This work appeared in: Proceedings of the IEEE International Work- shop on Computational Advances in Multi-Sensor Adaptive Processing be ‘multiplied on k sides’ by k matrices. The covariant multi- (1) (CAMSAP ’05), 1 (2005), pp. 129–132. linear matrix multiplication of A by matrices M = [m ] ∈ 1 j1i1 d1×s1 (k) dk×sk ,...,Mk = [m ] ∈ is defined by 3. SINGULAR VALUES AND SINGULAR VECTORS R jkik R

d ×···×d Let A ∈ R 1 k . Then A defines a multilinear functional A(M1,...,Mk) := d d d p A : R 1 × · · · × R k → R via (3). Let us equip R i with the l i - d1 dk X X (1) (k) s1×···×sk r ··· aj ···j m ··· m z ∈ . norm, k·kp , i = 1, . . . , k. We will define the singular values and 1 k j1i1 jkik R i j1=1 jk=1 singular vectors of A as the critical values and critical points of

This operation arises from the way a multilinear functional trans- A(x1,..., xk)/kx1kp1 · · · kxkkpk , suitably normalized. Taking d d forms under compositions with linear maps. In particular, the mul- a constrained variational approach, we let L : R 1 × · · · × R k × d ×···×d tilinear functional associated with a tensor A ∈ R 1 k and R → R be its gradient may be succinctly expressed via covariant multilinear multiplication: L(x1,..., xk, σ) :=

A(x1,..., xk) − σ(kx1kp · · · kxkkp − 1). Xd1 Xdk (1) (k) 1 k A(x1,..., xk) = ··· aj ···j x ··· x , (3) 1 k j1 jk j1=1 jk=1 L is continuously differentiable when xi 6= 0, i = 1, . . . , k. The ∇xi A(x1,..., xk) = A(x1,..., xi−1,Idi , xi+1,..., xk). vanishing of the gradient,

Note that we have slightly abused notations by using A to denote ∇L = (∇ L, . . . , ∇ L, ∇ L) = (0,..., 0, 0) both the tensor and its associated multilinear functional. x1 xk σ n×···×n An order-k tensor aj1···jk ∈ R is called symmetric gives if aj ···j = aj J···j forK any permutation σ ∈ Sk. The σ(1) σ(k) 1 k A(Id , x2, x3,..., xk) = σϕp −1(x1), homogeneous polynomial associated with a symmetric tensor A = 1 1 A(x1,Id2 , x3,..., xk) = σϕp2−1(x2), aj1···jk and its gradient can again be conveniently expressed as J K . (6) Xn Xn . A(x,..., x) = ··· aj1···jk xj1 ··· xjk , (4) j1=1 jk=1 A(x1, x2,..., xk−1,Idk ) = σϕpk−1(xk), ∇A(x,..., x) = kA(In, x,..., x). at a critical point (x1,..., xk, σ). As in the derivation of (2), one Observe that for a symmetric tensor A, gets also the unit norm condition

kx k = ··· = kx k = 1. A(In, x, x,..., x) = A(x,In, x,..., x) = 1 p1 k pk

··· = A(x, x,..., x,In). (5) The unit vector xi and σ in (6), will be called the mode-i singular vector, i = 1, . . . , k, and singular value of A respectively. Note The preceding discussion is entirely algebraic but we will now that the mode-i singular vectors are simply the order-k equivalent introduce norms on the respective spaces. Let k·kαi be a norm d of left- and right-singular vectors for order 2 (a matrix has two on R i , i = 1, . . . , k. Then the norm (cf. [6]) of the multilinear d1 dk ‘sides’ or modes while an order-k tensor has k). functional A : ×· · ·× → induced by k·kα ,..., k·kα R R R 1 k p1,...,pk is defined as We will use the name l -singular values/vectors if we wish to emphasize the dependence of these notions on k·kpi , i = |A(x1,..., xk)| 1, . . . , k. If p = ··· = p = p, then we will use the shorter kAk := sup 1 k α1,...,αk lp p kx1kα1 · · · kxkkαk name -singular values/vectors. Two particular choices of will be of interest to us: p = 2 and p = k — both of which reduce to di where the supremum is taken over all non-zero xi ∈ R , i = the matrix case when k = 2 (not so for other choices of p). The p 1, . . . , k. We will be interested in the case where the k·kαi ’s are l - former yields norms. Recall that for 1 ≤ p ≤ ∞, the lp-norm is a continuously n | n differentiable function on R \{0}. For x = [x1, . . . , xn] ∈ R , A(x1,..., xi,Idi , xi+1,..., xk) = σxi, i = 1, . . . , k, we will write p p p | while the latter yields a homogeneous system of equations that is x := [x1, . . . , xn] invariant under scaling of (x ,..., x ). In fact, when k is even, (ie. taking pth power coordinatewise) and 1 k the lp-singular values/vectors are solutions to p p | ϕp(x) :=[sgn(x1)x1,..., sgn(xn)xn] k−1 A(x1,..., xi,Idi , xi+1,..., xk) = σxi , i = 1, . . . , k. where  The following results are easy to show. The first proposition +1 if x > 0,  follows from the definition of norm and the observation that a max- sgn(x) = 0 if x = 0,  imizer in an open set must be critical. The second proposition fol- −1 if x < 0. lows from the definition of hyperdeterminant [7]; the conditions p Observe that if p is even, then ϕp(x) = x . The gradient of the on di are necessary and sufficient for the existence of ∆. lp-norm is given by Proposition 1. The largest lp1,...,pk -singular value is equal to the ϕ (x) norm of the multilinear functional associated with A induced by ∇kxk = p−1 p p−1 the norms k·k ,..., k·k , ie. kxkp p1 pk

p−1 p−1 σ (A) = kAk . or simply ∇kxkp = x /kxkp when p is even. max p1,...,pk Proposition 2. Let d1, . . . , dk be such that 6. APPLICATIONS X di − 1 ≤ (dj − 1) for all i = 1, . . . , k, Several distinct generalizations of singular values/vectors and eigen- j6=i values/vectors from matrices to higher-order tensors have been d ×···×d and ∆ denote the hyperdeterminant in R 1 k . Then 0 is an proposed in [3, 8, 4, 9, 5]. As one can expect, there is no one 2 d ×···×d l -singular value of A ∈ R 1 k if and only if single generalization that preserves all properties of matrix singu- lar values/vectors or matrix eigenvalues/vectors. In the lack of a ∆(A) = 0. canonical generalization, the validity of a multilinear generaliza- tion of a bilinear concept is often measured by the extent to which 4. EIGENVALUES AND EIGENVECTORS OF it may be applied to obtain interesting or useful results. SYMMETRIC TENSORS The proposed notions of l2- and lk-singular/eigenvalues arise

n×···×n naturally in the context of several different applications. We have Let A ∈ R be an order-k symmetric tensor. Then A defines 2 n mentioned the relation between the l -singular values/vectors and a degree-k homogeneous polynomial function A : R → R via p n the best rank-1 approximation of a tensor under the Frobenius (4). With a choice of l -norm on R , we may consider the mul- 2 k norm obtained in [3, 4]. Another example is the appearance of l - tilinear Rayleigh quotient A(x,..., x)/kxkp. The Lagrangian n eigenvalues/vectors of symmetric tensors in the approximate solu- L : R × R → R, tions of constraint satisfaction problems [10, 11]. A third example k is the use of lk-eigenvalues for order-k symmetric tensors (k even) L(x, λ) := A(x,..., x) − λ(kxkp − 1), for characterizing the positive definiteness of homogeneous poly- is continuously differentiable when x 6= 0 and nomial forms — a problem that is important in automatic control and array signal processing (see [5, 12] and the references cited ∇L = (∇xL, ∇λL) = (0, 0) therein). gives Here we will give an application of lk-eigenvalues and eigen- A(In, x,..., x) = λϕp−1(x) (7) vectors of a nonsymmetric tensor of order k. We will show that at a critical point (x, λ) where kxkp = 1. The unit vector x and a multilinear generalization of the Perron-Frobenius theorem [13] k scalar λ will be called an lp-eigenvector and lp-eigenvalue of A may be deduced from the notion of l -eigenvalues/vectors as de- respectively. Note that the LHS in (7) satisfies the symmetry in (5). fined by (9). n×···×n As in the case of singular values/vectors, the instances where Let A = aj1···jk ∈ R . We write A > 0 if all 2 J K p = 2 and p = k are of particular interest. The l -eigenpairs are aj1···jk > 0 (likewise for A ≥ 0). We write A > B if A − B > 0 characterized by (likewise for A ≥ B). A(In, x,..., x) = λx An order-k tensor A is reducible if there exists a permutation k σ ∈ Sn such that the permuted tensor where kxk2 = 1. When the order k is even, the l -eigenpairs are characterized by n×···×n bi1···ik = aσ(j1)···σ(jk) ∈ R k−1 J K J K A(In, x,..., x) = λx (8) has the property that for some m ∈ {1, . . . , n − 1}, bi1···ik = 0 and in this case the unit-norm constraint is superfluous since (8) for all i1 ∈ {1, . . . , n − m} and all i2, . . . , ik ∈ {1, . . . , m}. is a homogeneous system and x may be scaled by any non-zero If we allow a few analogous matrix terminologies, then A is scalar α. reducible if there exists a permutation matrix P so that We shall refer the reader to [5] for some interesting results on n×···×n l2-eigenvalues and lk-eigenvalues for symmetric tensors — many B = A(P,...,P ) ∈ R of which mirrors familiar properties of matrix eigenvalues. can be partitioned into 2n subblocks and regarded as a 2 × · · · × 2 m×···×m block-tensor with ‘square diagonal blocks’ B00···0 ∈ R , (n−m)×···×(n−m) 5. EIGENVALUES AND EIGENVECTORS OF B11···1 ∈ R , and a zero ‘corner block’ B10···0 ∈ (n−m)×m×···×m NONSYMMETRIC TENSORS R which we may assume without loss of general- ity to be in the (1, 0,..., 0)-‘corner’. We know that one cannot use the variational approach to charac- We say that A is irreducible if it is not reducible. In particular, terize eigenvalues/vectors of nonsymmetric matrices. So for an if A > 0, then it is irreducible. nonsymmetric tensor A ∈ n×···×n, we will instead define eigen- R n×···×n values/vectors by (7) — an approach that is consistent with the ma- Theorem 1. Let A = aj1···jk ∈ R be irreducible and k k trix case. As (5) no longer holds, we now have k different forms A ≥ 0. Then A hasJ a positiveK real l -eigenvalue with an l - of (7): eigenvector x∗ that may be chosen to have all entries non-negative. A(In, x1, x1,..., x1) = µ1ϕp−1(x1), In fact, x∗ is unique and has all entries positive.

A(x1,In, x2,..., x2) = µ2ϕp−1(x2), n n Proof. Let S+ := {x ∈ R | x ≥ 0, kxkk = 1}. For any (9) n . x ∈ S+, we define . k−1 A(xk, xk,..., xk,In) = µkϕp−1(xk). µ(x) := inf{µ ∈ R+ | A(I, x,..., x) ≤ µx }.

We will call the unit vector xi a mode-i eigenvector of A corre- k−1 n Note that for x ≥ 0, ϕk−1(x) = x . Since S+ is compact, sponding to the mode-i eigenvalue µi, i = 1, . . . , k. Note that n there exists some x∗ ∈ S+ such that these are nothing more than the order-k equivalent of left and right n eigenvectors. µ(x∗) = inf{µ(x) | x ∈ Sk } =: µ∗. k Clearly, (12) and (13) allows us to scale [y0, x1]| to unit l -norm without k−1 A(In, x∗,..., x∗) ≤ µ∗x∗ . (10) affecting the validity of the inequalities and equalities. Thus we k have obtained a solution with at least m + 1 relations in (10) being We claim that x∗ is a (mode-1) l -eigenvector of A, ie. strict inequalities. Repeating the same arguments inductively, we k−1 A(In, x∗,..., x∗) = µ∗x∗ . can eventually replace all the equalities in (12) with strict inequal- ities, leaving us with (11), a contradiction. [We defer the proof of Suppose not. Then at least one of the relations in (10) must hold uniqueness and positivity of x∗ to the full paper.] with strict inequality. However, not all the relations in (10) can hold with strictly inequality since otherwise A proposal to use the multilinear Perron-Frobenius theorem in the ranking of linked objects may be found in [14]. A symmetric k−1 A(In, x∗,..., x∗) < µ∗x∗ (11) version of this result can be used to study hypergraphs [15]. would contradict the definition of µ∗ as an infimum. Without loss 7. REFERENCES of generality, we may assume that the first m relations in (10) are the ones that hold with strict inequality and the remaining n − m [1] V. Choulakian, “Transposition invariant principal component relations are the ones that hold with equality. We will write x∗ = analysis in l1 for long tailed data,” Statist. Probab. Lett., vol. m n−m [x0, x1]| with x0 ∈ R , x1 ∈ R . By assumption, A may be 71, no. 1, pp. 23–31, 2005. partitioned into blocks so that [2] V. Choulakian, “l1-norm projection pursuit principal compo- nent analysis,” Comput. Statist. Data Anal., 2005, to appear. A00···0(Im, x0,..., x0, x0)+ [3] P. Comon, “Tensor decompostions: state of the art and appli- A00···1(Im, x0,..., x0, x1) + ··· + cations,” Oxford, UK, 2002, number 71 in Inst. Math. Appl. k−1 A01···1(Im, x1,..., x1, x1) < µ∗x0 , (12) Conf. Ser., pp. 1–24, Oxford Univ. Press. [4] L. de Lathauwer, B. de Moor, and J. Vandewalle, “On the A10···0(In−m, x0,..., x0, x0)+ best rank-1 and rank-(r1, . . . , rN ) approximation of higher-

A00···1(In−m, x0,..., x0, x1) + ··· + order tensors,” SIAM J. Matrix Anal. Appl., vol. 21, no. 4, pp. 1324–1342, 2000. A (I , x ,..., x , x ) = µ xk−1. (13) 11···1 n−m 1 1 1 ∗ 1 [5] L. Qi, “Eigenvalues of a real supersymmetric tensor,” J. Symbolic Comput., 2005, to appear. Note that x0 6= 0 since the LHS of (12) is non-negative. We will fix x1 and consider the following (vector-valued) functions of y: [6] A. Defant and K. Floret, Tensor norms and ideals, Number 176 in North-Holland Mathematics Studies. North-

F (y) := A00···0(Im, y,..., y, y)+ Holland, Amsterdam, 1993.

A00···1(Im, y,..., y, x1) + ··· + [7] I.M. Gelfand, M.M. Kapranov, and A.V. Zelevinsky, “Hy- k−1 perdeterminants,” Adv. Math., vol. 96, no. 2, pp. 226–263, A01···1(Im, x1,..., x1, x1) − µ∗y , 1992. G(y) := A (I , y,..., y, y)+ [8] L. de Lathauwer, B. de Moor, and J. Vandewalle, “A multi- 10···0 n−m linear singular value decomposition,” SIAM J. Matrix Anal. A00···1(In−m, y,..., y, x1) + ··· + Appl., vol. 21, no. 4, pp. 1253–1278, 2000. k−1 A11···1(In−m, x1,..., x1, x1) − µ∗x1 . [9] J.R. Ru´ız-Tolosa and E. Castillo, From vectors to tensors, Universitext. Springer-Verlag, Berlin, 2005. Let f , . . . , f and g , . . . , g be the component functions of 1 m 1 n−m [10] N. Alon, W.F. de la Vega, R. Kannan, and M. Karpinski, F and G respectively, ie. f ’s and g ’s are real-valued functions i i “Random sampling and approximation of max-csps,” J. of y ∈ m such that F (y) = [f (y), . . . , f (y)]| and G(y) = R 1 m Comput. System Sci., vol. 67, no. 2, pp. 212–243, 2003. [g1(y), . . . , gn−m(y)]|. [11] W.F. de la Vega, R. Kannan, M. Karpinski, and S. Vem- By (12), we get fi(x0) < 0 for i = 1, . . . , m. Since fi is m pala, “Tensor decomposition and approximation algorithms continuous, there is a neighborhood B(x0, δi) ⊆ R such that for constraint satisfaction problems,” New York, NY, USA, fi(y) < 0 for all y ∈ B(x0, δi). Let δ = min{δ1, . . . , δm}. 2005, pp. 747–754, ACM Press. Then F (y) < 0 for all y ∈ B(x0, δ). By (13), we get gj (x0) = 0 for j = 1, . . . , n − m. Ob- [12] L. Qi and K.L. Teo, “Multivariate polynomial minimization serve that if gj is not identically 0, then gj (y) = gj (y1, . . . , ym) and its application in signal processing,” J. Global Optim., is a non-constant multivariate polynomial function in the variables vol. 26, no. 4, pp. 419–433, 2003. y1, . . . , ym. Furthermore, all coefficients of this multivariate poly- [13] A. Berman and R.J. Plemmons, Nonnegative matrices in nomial are non-negative since A ≥ 0. It is easy to see that such the mathematical sciences, Number 9 in Classics in applied a function must be ‘strictly monotone’ in the following sense: if mathematics. SIAM, Philadelphia, PA, 1994. 0 ≤ y ≤ z and z 6= y, then gj (y) < gj (z). So for 0 ≤ y ≤ x0 [14] L.-H. Lim, “Multilinear pagerank: measuring higher order and y 6= x0, we get gj (y) < gj (x0) = 0. Since A is irreducible, connectivity in linked objects,” The Internet: Today & To- A10···0 is non-zero and thus some gj is not identically 0. morrow, July 2005. Let y0 be a point on the line joining 0 to x0 within a distance [15] P. Drineas and L.-H. Lim, “A multilinear spectral theory of δ of x0 and y0 6= x0. Then with y0 in place of x0, the m strict inequalities in (12) are retained while at least one equality in (13) hypergraphs and expander hypergraphs,” work in progress, will have become a strict inequality. Note that the homogeneity of 2005.