CANDECOMP/PARAFAC Decomposition of High-Order
Total Page:16
File Type:pdf, Size:1020Kb
1 CANDECOMP/PARAFAC Decomposition of High-order Tensors Through Tensor Reshaping Anh Huy Phan∗, Petr Tichavsk´yand Andrzej Cichocki Abstract In general, algorithms for order-3 CANDECOMP/PARAFAC (CP), also coined canonical polyadic decomposition (CPD), are easily to implement and can be extended to higher order CPD. Unfortunately, the algorithms become computationally demanding, and they are often not applicable to higher order and relatively large scale tensors. In this paper, by exploiting the uniqueness of CPD and the relation of a tensor in Kruskal form and its unfolded tensor, we propose a fast approach to deal with this problem. Instead of directly factorizing the high order data tensor, the method decomposes an unfolded tensor with lower order, e.g., order-3 tensor. On basis of the order-3 estimated tensor, a structured Kruskal tensor of the same dimension as the data tensor is then generated, and decomposed to find the final solution using fast algorithms for the structured CPD. In addition, strategies to unfold tensors are suggested and practically verified in the paper. Index Terms Tensor factorization, canonical decomposition, PARAFAC, ALS, structured CPD, tensor unfolding, Cram´er-Rao induced bound (CRIB), Cram´er-Rao lower bound (CRLB) arXiv:1211.3796v1 [math.NA] 16 Nov 2012 I. Introduction CANDECOMP/PARAFAC [1], [2], also known as Canonical polyadic decomposition (CPD), is a common tensor factorization which has found applications such as in chemometrics [3]–[5], telecom- munication [6], [7], analysis of fMRI data [8], time-varying EEG spectrum [9], [10], data mining [11], A. H. Phan and A. Cichocki are with the Lab for Advanced Brain Signal Processing, Brain Science Institute, RIKEN, Wakoshi, Japan, e-mail: (phan,cia)@brain.riken.jp. A. Cichocki is also with System Research Institute, Warsaw, Poland. P. Tichavsk´yis with Institute of Information Theory and Automation, Prague, Czech Republic, email: [email protected]. The work of P. Tichavsk´ywas supported by Grant Agency of the Czech Republic 102/09/1278 November 19, 2012 DRAFT 2 [12], separated representations for generic functions involved in quantum mechanics or kinetic theory descriptions of materials [13], classification, clustering [14], compression [15]–[17]. Although the original decomposition and applications were developed for three-way data, the model was later widely extended to higher order tensors. For example, P. G. Constantine et al. [18] modeled the pressure measurements along the combustion chamber as order-6 tensors corresponding to the flight conditions - Mach number, altitude and angle of attack, and the wall temperatures in the combustor and the turbulence mode. Hackbusch, Khoromskij, and Tyrtyshnikov [19] and Hackbusch and Khoromskij [20] investigated CP approximation to operators and functions in high dimensions. Oseledets and Tyrtyshnikov [21] approximated the Laplace operator and the general second-order operator which appears in the Black-Scholes equation for multi- asset modeling to tackle the dimensions up to N = 200. In neuroscience, M. Mørup et al. [9] analyzed order-4 data constructed from EEG signals in the time-frequency domain. Order-5 tensors consisting of dictionaries × timeframes × frequency bins × channels × trials-subjects [22] built up from EEG signals were shown to give high performance in BCI based on EEG motor imagery. In object recognition (digits, faces, natural images), CPD was used to extract features from order-5 Gabor tensors including hight × width × orientation × scale × images [22]. In general, many CP algorithms for order-3 tensor can be straightforwardly extended to decompose higher order tensors. For example, there are numerous algorithms for CPD including the alternating least squares (ALS) algorithm [1], [2] with line search extrapolation methods [1], [5], [23]–[25], rotation [26] and compression [27], or all-at-once algorithms such as the OPT algorithm [28], the conjugate gradient algorithm for nonnegative CP, the PMF3, damped Gauss-Newton (dGN) algorithms [5], [29] and fast dGN [30]–[32], or algorithms based on joint diagonalization problem [33]–[35]. The fact is that the algorithms become more complicated, computationally demanding, and often not applicable to relatively large scale tensors. For example, complexity of gradients of the cost function with respect to factors grows linearly N with the number of dimensions N. It has a computational cost of order O NR I for a tensor of size n Yn=1 × ×···× = − I1 I2 IN. More tensor unfoldings Y(n) (n 2, 3,..., N 1) means more time consuming due to accessing non-contiguous blocks of data entries and shuffling their orders in a computer. In addition, line search extrapolation methods [1], [4], [5], [23], [24], [36] become more complicated, and demand high computational cost to build up and solve (2N − 1)- order polynomials. The rotation method [26] needs to estimate N rotation matrices of size R × R with a whole complexity per iteration of order O(N3R6). Recently, a Cram´er-Rao Induced Bounds (CRIB) on attainable squared angular error of factors in the CP decomposition has been proposed in [37]. The bound is valid under the assumption that the November 19, 2012 DRAFT 3 decomposed tensor is corrupted by additive Gaussian noise which is independently added to each tensor element. In this paper we use the results of [37] to design the tensor unfolding strategy which ensures as little deterioration of accuracy as possible. This strategy is then verified in the simulations. By exploiting the uniqueness of CPD under mild conditions and the relation of a tensor in the Kruskal form [38] and its unfolded tensor, we propose a fast approach for high order and relatively large-scale CPD. Instead of directly factorizing the high order data tensor, the approach decomposes an unfolded tensor in lower order, e.g., order-3 tensor. A structured Kruskal tensor of the same dimension of the data tensor is then generated, and decomposed to find the desired factor matrices. We also proposed the fast ALS algorithm to factorize the structured Kruskal tensor. The paper is organized as follows. Notation and the CANDECOMP/PARAFAC are briefly reviewed in Section II. The simplified version of the proposed algorithm is presented in Section III. Loss of accuracy is investigated in Section III-A, and an efficient strategy for tensor unfolding is summarized in Section III-B. For difficult scenario decomposition, we proposed a new algorithm in Section IV. Simulations are performed on random tensors and real-world dataset in Section V. Section VI concludes the paper. II. CANDECOMP/PARAFAC (CP) decomposition Throughout the paper, we shall denote tensors by bold calligraphic letters, e.g., A ∈ RI1×I2×···×IN , I×R matrices by bold capital letters, e.g., A =[a1, a2,..., aR] ∈ R , and vectors by bold italic letters, e.g., a j or I = [I1, I2,..., IN]. A vector of integer numbers is denoted by colon notation such as k = i: j = [i, i + 1,..., j − 1, j]. For example, we denote 1:n = [1, 2,..., n]. The Kronecker product, the Khatri-Rao (column-wise Kronecker) product, and the (element-wise) Hadamard product are denoted respectively by ⊗, ⊙, ⊛ [38], [39]. Definition 2.1: (Kruskal form (tensor) [38], [40]) A tensor X ∈ RI1×I2×···×IN is in Kruskal form if R (1) (2) (N) X = λr ar ◦ ar ◦···◦ ar , (1) Xr=1 △ (1) (2) (N) = ~λ; A , A ,..., A , λ = [λ1, λ2,...,λR]. (2) (n) (n) (n) (n) RIn×R where symbol “◦” denotes the outer product, A = [a1 , a2 ,..., aR ] ∈ , (n = 1, 2,..., N) are (n)T (n) factor matrices, ar ar = 1, for all r and n, and λ1 ≥ λ2 ≥···≥ λR > 0. Definition 2.2: (CANDECOMP/PARAFAC (CP) [1], [2], [40], [41] ) Approximation of order-N data tensor Y ∈ RI1×I2×···×IN by a rank-R tensor in the Kruskal form means Y = bY + E, (3) November 19, 2012 DRAFT 4 TABLE I N N Complexities per iteration of major computations in CPD algorithms. J = n=1 In, T = n=1 In. Q P Computing Process Complexity Gradient [1], [2] O (NRJ) Fast gradient [42] O (RJ) (Approximate) Hessian and its inverse [5], [29] O R3T 3 Fast (approximate) Hessian and its inverse [31], [37] O R2T + N3R6 Exact line search [1], [4], [5] O 2N RJ Rotation [26] O N3R6 b λ (1) (2) (N) b 2 where Y = ~ ; A , A ,..., A , so that kY − YkF is minimized. There are numerous algorithms for CPD including alternating least squares (ALS) or all-at-once optimization algorithms, or based on joint diagonalization. In general, most CP algorithms which factorize order-N tensor often face high computational cost due to computing gradients and (approximate) Hessian, line search and rotation. Table I summarizes complexities of major computations in popular CPD algo- rithms. Complexity per iteration of a CP algorithm can be roughly computed based on Table I. For exam- ple, the ALS algorithm with line search has a complexity of order O(NRJ+2NRJ+NR3) = O(2NRJ+NR3). III. CPD of unfolded tensors In order to deal with existing problems for high order and relatively large scale CPD, the following process is proposed: 1) Reduce the number of dimensions of the tensor Y to a lower order (e.g., order-3) through tensor unfolding Y~l which is defined later in this section. b 2) Approximate the unfolded tensor Y~l by an order-3 tensor Y~l in the Kruskal form. Dimensions of Y~l which are relatively larger than rank R can be reduced to R by the Tucker compression [43]–[46] prior to CPD although it is not a lossless compression. In such case, we only need to decompose an R × R × R dimensional tensor. b 3) Estimate the desired components of the original tensor Y on basis of the tensor Y~l in the Kruskal form. The method is based on an observation that unfolding of a Kruskal tensor also yields a Kruskal tensor.