Tensor Decomposition with Smoothness

Tensor Decomposition with Smoothness Masaaki Imaizumi 1 Kohei Hayashi 23 Abstract To solve these problems, the low-rank assumption, i.e., Real data tensors are typically high dimensional; given tensor is generated from a small number of latent however, their intrinsic information is preserved factors, is widely used. If the number of observed ele- in low-dimensional space, which motivates the ments is sufficiently larger than the number of latent fac- use of tensor decompositions such as Tucker tors (i.e., rank) and noise level, we can estimate latent fac- decomposition. Frequently, real data tensors tors and reconstruct the entire structure. The methods of smooth in addition to being low dimensional, estimating latent factors are collectively referred to as ten- which implies that adjacent elements are similar sor decompositions. There are several formulations of ten- or continuously changing. These elements typi- sor decompositions such as Tucker decomposition (Tucker, cally appear as spatial or temporal data. We pro- 1966) and the CANDECOMP/PARAFAC(CP) decomposi- pose smoothed Tucker decomposition (STD) to tion (Harshman, 1970). While these methods were orig- incorporate the smoothness property. STD lever- inally formulated as nonconvex problems, several authors ages smoothness using the sum of a few basis have studied their convex relaxations in recent years (Liu functions; this reduces the number of parameters. et al., 2009; Tomioka et al., 2010; Signoretto et al., 2011; An objective function is formulated as a convex Gandy et al., 2011). problem, and an algorithm based on the alternat- Another important, yet less explored, assumption is the ing direction method of multipliers is derived to smoothness property. Consider fMRI data as a ten- solve the problem. We theoretically show that, sor X. As fMRI data are spatiotemporal, each ele- under the smoothness assumption, STD achieves ment of X is expected to be similar to its adjacent el- a better error bound. The theoretical result and ements with every way, i.e., xi,j,k,t should be close to performances of STD are numerically verified. xi 1,j,k,t,xi,j 1,k,t,xi,j,k 1,t, and xi,j,k,t 1. In statistics, ± ± ± ± this kind of smoothness property has been studied through functional data analysis (Ramsay, 2006; Hsing & Eubank, 1. Introduction 2015). Studies show that the smoothness assumption in- creases sample efficiency, i.e., estimation is more accu- A tensor (i.e., a multi-way array) is a data structure that is a rate with small sample size. Another advantage is that generalization of a matrix, and it can represent higher-order the smoothness assumption makes interpolation possible, relationships. Tensors appear in various applications such i.e., we can impute an unobserved value using its adjacent as image analysis (Jia et al., 2014), data mining (Kolda & observed values. This interpolation ability is particularly Sun, 2008), and medical analysis (Zhou et al., 2013). For useful for solving a specific tensor completion problem re- instance, functional magnetic resonance imaging (fMRI) ferred to as the tensor interpolation problem, as known as records brain activities in each time period as voxels, which the “cold-start” problem (Gantner et al., 2010). Suppose are represented as 4-way tensors (X-axis Y-axis Z-axis ⇥ ⇥ a case in which fMRI tensor X is completely missing at time). Frequently, data tensors in the real world con- ⇥ t = t0. In this case, standard tensor decompositions cannot tain several missing elements and/or are corrupted by noise, predict missing elements because there is no information which leads to the tensor completion problem for predict- to estimate the latent factor at t = t . However, using the ing missing elements and the tensor recovery problem for 0 smoothness property, we can estimate the missing elements removing noise. from the elements at t = t0 1 and t = t0 +1. − 1Institute of Statistical Mathematics 2National Institute of Ad- A fundamental challenge of tensor completion and recov- vanced Industrial Science and Technology 3RIKEN. Correspon- ery methods is to analyze their performance. Tomioka et al. dence to: Masaaki Imaizumi <[email protected]>. (2011) extensively studied the statistical performance of th low-rank tensor decompositions. On the contrary, the per- Proceedings of the 34 International Conference on Machine formance of tensor decompositions incorporating smooth- Learning, Sydney, Australia, PMLR 70, 2017. Copyright 2017 by the author(s). ness (Yokota et al., 2015b;a; Amini et al., 2013) has never Tensor Decomposition with Smoothness been addressed. The most important barrier is that all methods are formulated as nonconvex problems, which hinders the use of the tools developed in the convex tensor decompositions (Liu et al., 2009; Signoretto et al., 2010; Tomioka et al., 2010). Contributions In this paper, we propose a simple tensor decomposition model incorporating the smoothness property, which we refer to as Smoothed Tucker Decomposition (STD). Following the notions of functional data analysis, STD approximates an observed tensor by a small number of basis functions, such as Fourier series, and decomposes them through Tucker decomposition (Figure 1). STD is formulated as a convex optimization problem that regularizes the ranks and degree of smoothness. To solve this problem, we derive an algorithm based on the alternating direc- Figure 1. Comparison of Tucker decomposition and STD. In tion method of multipliers (ADMM), which always finds STD, the mode-wise smoothness of the observed tensor is pre- the global optimum. served via basis functions. Based on the convex formulation, we provide a few theoretical guarantees of STD, namely, we derive error bounds 2. Preliminaries for tensor recovery and interpolation problems. We show that the error bounds for smooth tensors are improved and Given K N natural numbers, I1,...,IK N, let I ... I 2 2 X⇢ better than those for other methods. In addition, to the best R 1⇥ ⇥ K be the space of a K-way tensor, and X be 2X of our knowledge, this is the first analysis that establishes the K-way tensor that belongs to . For practical use, we X an error bound for tensor interpolation. These results are define I k := k =k Ik. Each way of a tensor is referred \ 0 empirically confirmed through experiments using synthetic to as mode; Ik is the dimensionality of the k-th mode for Q d and real data. k =1,...,K. For vector Y R , [Y ]j denotes its j-th el- 2 ement. Similarly, [X]j1j2...jK denotes the (i1,i2,...,iK )- To summarize, STD has the following advantages. th element of X. The inner product in is defined as I1,I2,...,IK X X, X0 = [X]j ,j ,...,j [X0]j ,j ,...,j for Sample efficiency: STD achieves the same error with h i j1,j2,...,jK =1 1 2 K 1 2 K • X, X0 . This induces the Frobenius norm, X = less sample size. 2XP ||| |||F X, X . For vectors Z Rd, let Z = pZT Z denote Interplation ability: STD can solve the tensor interpo- h i 2 k k • the norm. In addition, we introduce the L norm for func- lation problem. p 2 tions as f 2 = f(t)2dt for function f : I R with Convex formulation: STD ensures that a global solu- k k2 I ! • some domain I R. ↵(I) denotes a set of an ↵-times tion is obtained. ⇢R C differentiable function on I. Related Works A few authors have investigated the smoothness property for tensor decompositions. Amini 2.1. Tucker Decomposition et al. (2013) proposed a kernel method, and Yokota et al. With a set of finite positive integers (R1,...,RK ), the (2015a;b) developed a smooth decomposition method for Tucker decomposition of X is defined as matrices and tensors using basis functions. These studies demonstrated that the smoothness assumption signifi- R1,...,RK X = g u(k) u(k) ...u(k), (1) cantly improves the performance of tensor decompositions r1...rK r1 ⌦ r2 ⌦ rK r ,...,r =1 for actual applications such as noisy image reconstruc- 1 XK tion (Yokota et al., 2015b). However, these performance where gr1...rK R is a coefficient for each rk, de- gains were confirmed only in an empirical manner. 2 Ik ⌦ notes the tensor product, and urk R denotes vector 2 Several authors have addressed the tensor interpolation for each rk(k =1,...,K), which are orthogonal to each problem by extending tensor decompositions; however, in- other for rk =1,...,Rk. Here, we refer to (R1,...,RK ) stead of smoothness, these methods utilize additional in- as the Tucker rank, and X is an (R1,...,RK )-rank ten- R ... R sor. In addition, we let tensor G R 1⇥ ⇥ K with formation such as network structures (Hu et al., 2015) 2 or side information (Gantner et al., 2010; Narita et al., [G]r1,...,rK = gr1...rK be a core tensor and matrix U(k) = (k) (k) Ik Rk 2011). Moreover, the performance of the tensor interpo- (ur1 ...urK ) R ⇥ is a set of the vectors for all 2 lation problem has never been analyzed theoretically. k =1,...,K. Using this notation, Tucker decomposition Tensor Decomposition with Smoothness (1) can be written as To solve the problem (5) using the Schatten regularization, ADMM is frequently employed (Boyd et al., 2011; X = G U U ... U , (2) ⇥1 (1) ⇥1 (2) ⇥2 ⇥K (K) Tomioka et al., 2010). ADMM generates a sequence of variables and Lagrangian multipliers by iteratively mini- where denotes the k-mode matrix product (see Kolda ⇥k mizing the augmented Lagrangian function. It is known & Bader (2009) for more details). that ADMM can easily solve an optimization problem with a non-differentiable regularization term such as s. 2.2. Application Problems for Tensors |||·||| I1,I2,...,IK Let S (j1,j2,...,jK ) j ,j ,...,j =1 be an index set and 3.

Load more