Singular Values and Eigenvalues of Tensors: a Variational Approach

SINGULAR VALUES AND EIGENVALUES OF TENSORS: A VARIATIONAL APPROACH Lek-Heng Lim Stanford University Institute for Computational and Mathematical Engineering Gates Building 2B, Room 286, Stanford, CA 94305 ABSTRACT order-2 tensor), the constrained variational approach generalizes We propose a theory of eigenvalues, eigenvectors, singular values, in a straight-forward manner — one simply replaces the bilinear and singular vectors for tensors based on a constrained variational functional x|Ay (resp. quadratic form x|Ax) by the multilinear approach much like the Rayleigh quotient for symmetric matrix functional (resp. homogeneous polynomial) associated with a ten- eigenvalues. These notions are particularly useful in generalizing sor (resp. symmetric tensor) of order k. The constrained critical certain areas where the spectral theory of matrices has tradition- values/points then yield a notion of singular values/vectors (resp. ally played an important role. For illustration, we will discuss a eigenvalues/vectors) for order-k tensors. multilinear generalization of the Perron-Frobenius theorem. An important point of distinction between the order-2 and order- k cases is in the choice of norm for the constraints. At first glance, it may appear that we should retain the l2-norm. However, the crit- 1. INTRODUCTION icality conditions so obtained are no longer scale invariant (ie. the It is well known that the eigenvalues and eigenvectors of a sym- property that xc in (1) or (uc, vc) in (2) may be replaced by αxc metric matrix A are the critical values and critical points of its or (αuc, αvc) without affecting the validity of the equations). To | 2 preserve the scale invariance of eigenvectors and singular vectors Rayleigh quotient, x Ax/kxk2, or equivalently, the critical val- 2 | for tensors of order k ≥ 3, the l -norm must be replaced by the ues and points of the quadratic form x Ax constrained to vectors k 2 n l -norm (where k is the order of the tensor), with unit l -norm, {x | kxk2 = 1}. If L : R × R → R is the associated Lagrangian with Lagrange multiplier λ, k k 1/k kxkk = (|x1| + ··· + |xn| ) . | 2 L(x, λ) = x Ax − λ(kxk2 − 1), n then the vanishing of ∇L at a critical point (xc, λc) ∈ × The consideration of eigenvalues and singular values with respect R R p yields the familiar defining condition for eigenpairs to l -norms where p 6= 2 is prompted by recent works [1, 2] of Choulakian, who studied such notions for matrices. Axc = λcxc. (1) Nevertheless, we shall not insist on having scale invariance. Instead, we will define eigenpairs and singular pairs of tensors Note that this approach does not work if A is nonsymmetric — the p critical points of L would in general be different from the solutions with respect to any l -norm (p > 1) as they can be interesting even of (1). when p 6= k. For example, when p = 2, our defining equations for A little less widely known is an analogous variational approach singular values/vectors (6) become the equations obtained in the m×n best rank-1 approximations of tensors studied by Comon [3] and to the singular values and singular vectors of a matrix A ∈ R , | de Lathauwer et. al. [4]. For the special case of symmetric tensors, with x Ay/kxk2kyk2 assuming the role of the Rayleigh quotient. m n our equations for eigenvalues/vectors for p = 2 and p = k define The associated Lagrangian function L : R × R × R → R is now respectively, the Z-eigenvalues/vectors and H-eigenvalues/vectors | in the soon-to-appear paper [5] of Qi. For simplicity, we will re- L(x, y, σ) = x Ay − σ(kxk2kyk2 − 1). strict our study to integer-valued p in this paper. L is continuously differentiable for non-zero x, y. The first order We thank Gunnar Carlsson, Pierre Comon, Lieven de Lath- condition yields auwer, Vin de Silva, and Gene Golub for helpful discussions. We | Ayc/kyck2 = σcxc/kxck2,A xc/kxck2 = σcyc/kyck2, would also like to thank Liqun Qi for sending us an advanced copy of his very relevant preprint. m n at a critical point (xc, yc, σc) ∈ R × R × R. Writing uc = xc/kxck2 and vc = yc/kyck2, we get the familiar | 2. TENSORS AND MULTILINEAR FUNCTIONALS Avc = σcuc,A uc = σcvc. (2) Although it is not immediately clear how the usual definitions A k-array of real numbers representing an order-k tensor will be of eigenvalues and singular values via (1) and (2) may be gen- d1×···×dk denoted by A = aj1···jk ∈ R . Just as an order-2 eralized to tensors of order k ≥ 3 (a matrix is regarded as an tensor (ie. matrix)J may beK multiplied on the left and right by a pair of matrices (of consistent dimensions), an order-k tensor may This work appeared in: Proceedings of the IEEE International Work- shop on Computational Advances in Multi-Sensor Adaptive Processing be ‘multiplied on k sides’ by k matrices. The covariant multi- (1) (CAMSAP ’05), 1 (2005), pp. 129–132. linear matrix multiplication of A by matrices M = [m ] ∈ 1 j1i1 d1×s1 (k) dk×sk ,...,Mk = [m ] ∈ is defined by 3. SINGULAR VALUES AND SINGULAR VECTORS R jkik R d ×···×d Let A ∈ R 1 k . Then A defines a multilinear functional A(M1,...,Mk) := d d d p A : R 1 × · · · × R k → R via (3). Let us equip R i with the l i - d1 dk X X (1) (k) s1×···×sk r ··· aj ···j m ··· m z ∈ . norm, k·kp , i = 1, . , k. We will define the singular values and 1 k j1i1 jkik R i j1=1 jk=1 singular vectors of A as the critical values and critical points of This operation arises from the way a multilinear functional trans- A(x1,..., xk)/kx1kp1 · · · kxkkpk , suitably normalized. Taking d d forms under compositions with linear maps. In particular, the mul- a constrained variational approach, we let L : R 1 × · · · × R k × d ×···×d tilinear functional associated with a tensor A ∈ R 1 k and R → R be its gradient may be succinctly expressed via covariant multilinear multiplication: L(x1,..., xk, σ) := A(x1,..., xk) − σ(kx1kp · · · kxkkp − 1). Xd1 Xdk (1) (k) 1 k A(x1,..., xk) = ··· aj ···j x ··· x , (3) 1 k j1 jk j1=1 jk=1 L is continuously differentiable when xi 6= 0, i = 1, . , k. The ∇xi A(x1,..., xk) = A(x1,..., xi−1,Idi , xi+1,..., xk). vanishing of the gradient, Note that we have slightly abused notations by using A to denote ∇L = (∇ L, . , ∇ L, ∇ L) = (0,..., 0, 0) both the tensor and its associated multilinear functional. x1 xk σ n×···×n An order-k tensor aj1···jk ∈ R is called symmetric gives if aj ···j = aj J···j forK any permutation σ ∈ Sk. The σ(1) σ(k) 1 k A(Id , x2, x3,..., xk) = σϕp −1(x1), homogeneous polynomial associated with a symmetric tensor A = 1 1 A(x1,Id2 , x3,..., xk) = σϕp2−1(x2), aj1···jk and its gradient can again be conveniently expressed as J K . (6) Xn Xn . A(x,..., x) = ··· aj1···jk xj1 ··· xjk , (4) j1=1 jk=1 A(x1, x2,..., xk−1,Idk ) = σϕpk−1(xk), ∇A(x,..., x) = kA(In, x,..., x). at a critical point (x1,..., xk, σ). As in the derivation of (2), one Observe that for a symmetric tensor A, gets also the unit norm condition kx k = ··· = kx k = 1. A(In, x, x,..., x) = A(x,In, x,..., x) = 1 p1 k pk ··· = A(x, x,..., x,In). (5) The unit vector xi and σ in (6), will be called the mode-i singular vector, i = 1, . , k, and singular value of A respectively. Note The preceding discussion is entirely algebraic but we will now that the mode-i singular vectors are simply the order-k equivalent introduce norms on the respective spaces. Let k·kαi be a norm d of left- and right-singular vectors for order 2 (a matrix has two on R i , i = 1, . , k. Then the norm (cf. [6]) of the multilinear d1 dk ‘sides’ or modes while an order-k tensor has k). functional A : ×· · ·× → induced by k·kα ,..., k·kα R R R 1 k p1,...,pk is defined as We will use the name l -singular values/vectors if we wish to emphasize the dependence of these notions on k·kpi , i = |A(x1,..., xk)| 1, . , k. If p = ··· = p = p, then we will use the shorter kAk := sup 1 k α1,...,αk lp p kx1kα1 · · · kxkkαk name -singular values/vectors. Two particular choices of will be of interest to us: p = 2 and p = k — both of which reduce to di where the supremum is taken over all non-zero xi ∈ R , i = the matrix case when k = 2 (not so for other choices of p). The p 1, . , k. We will be interested in the case where the k·kαi ’s are l - former yields norms. Recall that for 1 ≤ p ≤ ∞, the lp-norm is a continuously n | n differentiable function on R \{0}. For x = [x1, . , xn] ∈ R , A(x1,..., xi,Idi , xi+1,..., xk) = σxi, i = 1, . , k, we will write p p p | while the latter yields a homogeneous system of equations that is x := [x1, . , xn] invariant under scaling of (x ,..., x ). In fact, when k is even, (ie. taking pth power coordinatewise) and 1 k the lp-singular values/vectors are solutions to p p | ϕp(x) :=[sgn(x1)x1,..., sgn(xn)xn] k−1 A(x1,..., xi,Idi , xi+1,..., xk) = σxi , i = 1, .

Load more