275: Robust Principal Component Analysis for Generalized Multi-View
Total Page:16
File Type:pdf, Size:1020Kb
Robust Principal Component Analysis for Generalized Multi-View Models Frank Nussbaum1,2 Joachim Giesen1 1Friedrich-Schiller-Universität, Jena, Germany 2DLR Institute of Data Science, Jena, Germany Abstract where k · k is some suitable norm. The classical and still popular choice, see Hotelling [1933], Eckart and Young [1936], uses the Frobenius norm k · kF, which It has long been known that principal com- renders the optimization problem tractable. The Frobe- ponent analysis (PCA) is not robust with re- nius norm does not perform well though for grossly spect to gross data corruption. This has been corrupted data. A single grossly corrupted entry in X addressed by robust principal component can change the estimated low-rank matrix L signif- analysis (RPCA). The first computationally icantly, that is, the Frobenius norm approach is not tractable definition of RPCA decomposes a robust against data corruption. An obvious remedy is data matrix into a low-rank and a sparse com- replacing the Frobenius norm by the `1-norm k · k1, but ponent. The low-rank component represents this renders the optimization problem intractable be- the principal components, while the sparse cause of the non-convex rank constraint. Alternatively, component accounts for the data corruption. we can explicitly model a component that captures Previous works consider the corruption of data corruption. This leads to a decomposition individual entries or whole columns of the data matrix. In contrast, we consider a more X = L + S general form of data corruption that affects of the data matrix into a low-rank component L as groups of measurements. We show that the de- before and a matrix S of outliers. The structure of composition approach remains computation- the outlier matrix S depends on the data corruption ally tractable and allows the exact recovery of mechanism and is commonly assumed to be sparse. the decomposition when only the corrupted In practice, low-rank + sparse decompositions can be data matrix is given. Experiments on synthetic computed efficiently through the convex problem data corroborate our theoretical findings, and experiments on several real-world datasets min kLk∗ + γkSk1,2 subject to X = L + S, (1) L, S 2 Rm×n from different domains demonstrate the wide applicability of our generalized approach. where γ > 0 is a trade-off parameter between the nuclear norm k · k∗, which promotes low rank on L, and the `1,2-norm kSk1,2 = g ksgk2, which promotes structured sparsity on S given a partitioning of the 1 INTRODUCTION P entries in S into groups sg. The groups are determined Principal component analysis (PCA) is a classical data by the assumed data corruption mechanism. dimension reduction technique based on the assump- In the simplest data corruption mechanism, individual tion that given high-dimensional data lies near some entries of the data points can be corrupted. Under low-dimensional subspace, see Pearson [1901]. For- this model, the `1,2-norm reduces to the `1-norm, that (1) (n) mally, assume observed data points x , ... , x 2 is, the groups sg consist of single elements. Wright m m × n R that are combined into a data matrix X 2 R . et al. [2009], Candès et al. [2011], and Chandrasekaran Approximating the data matrix X by a low-rank matrix et al. [2011] were first to investigate this model. They L can be formulated as an optimization problem show that exact recovery using the specialized version of Problem (1) is possible, that is, in many cases the min kX - Lk subject to rank(L) 6 k, L 2 Rm×n corruptions can be separated from the data. Accepted for the 37th Conference on Uncertainty in Artificial Intelligence (UAI 2021). An alternative corruption mechanism that was intro- Observe that the approach in Wright et al. [2009], Can- duced independently by McCoy and Tropp [2011] and dès et al. [2011], Chandrasekaran et al. [2011] corre- Xu et al. [2010] corrupts whole data points, which are sponds to the special case where each sensor just mea- referred to as outliers. Here, the groups sg for the `1,2- sures a single variable, and the approach in McCoy norm are the columns of the data matrix X. Xu et al. and Tropp [2011], Xu et al. [2010] corresponds to the [2010] show that exact recovery is also possible in the special case where a single sensor measures all vari- column-sparse scenario. ables. In this work, we study a more general data corruption Of course, it is possible to think of data corruption mechanism, where the data points are partitioned as mechanisms that correspond to even more general group structures, for example, rectangular groups (i) (i) (i) m1 md m x = (x1 ,..., xd ) 2 R × ... × R = R , formed by sub-matrices of the data matrix. The lat- ter case does not pose any extra technical challenges. that is, they form d groups. We assume that for each Hence, here we keep the exposition simple and stick data point, the groups can be individually corrupted. with generalized multi-view models. These models are Hence, the groups in Problem (1) are given by flexible and suitable for many real-world applications. 0 1 We provide some examples in Section 3, namely, the s11 s12 ··· s1n identification of periods of large power consumption B . C m×n from electrical grid data, the reconstruction of RGB S = B . .. C 2 R , @ A images, and the detection of weather anomalies using sd1 sd2 ··· sdn wave hindcast data. m where sij 2 R i . In Section 2, we show that exact Some of the applications have data in the form of ten- recovery is still possible for our more general data sors. Therefore, we briefly discuss some additional corruption mechanism. This mechanism has a natural important related work that concerns robust tensor interpretation in terms of generalized multi-view mod- principle component analysis (RTPCA). The most els Sun [2013], Zhao et al. [2017], Zhang et al. [2019], closely related works follow the convex optimization where each data point is obtained by measurements approach. Their main modeling effort lies in the gener- from different sensors, and every sensor can measure alization of low rank and the nuclear norm to tensors. several variables. Sensor failures in this model result For example, Huang et al. [2014] propose RTPCA us- in corrupted measurements for the group of variables ing the sum of nuclear norms (SNN), which is based measured by the failing sensor, but only for the data on Tucker rank. For 3D tensors, Zhang et al. [2014], points that were measured while the sensor was not Lu et al. [2016, 2019], Zhou and Feng [2017] use a working correctly. Of course, data corruption and sen- nuclear norm based on t-SVD and tensor tubal rank. sor failures are an abstraction for what can also be Most works, including Huang et al. [2014], Lu et al. anomalies or outliers in applications, see Figure 1 for [2016, 2019], assume the simple data corruption mech- a real-world example. anism that corrupts individual entries of the data ten- sor. Zhang et al. [2014], Zhou and Feng [2017] consider outliers distributed along slices of 3D tensors. The lat- ter data corruption mechanism is a special case of our multi-view models when all groups have the same size (and the data matrix is viewed as a flattened version of a tensor). However, our general multi-view mod- els allow for different group sizes, which gives them additional flexibility and distinguishes them from all existing RTPCA models. Figure 1: The electrical load profiles of four house- 2 EXACT RECOVERY holds from one week. Power consumption for each household is measured in terms of six quantities per In this section, we prove exact recovery for generalized time step, which form groups of six elements. In the multi-view models. For that, we assume a data ma- plot of the data, these groups respectively span six trix X with underlying true decomposition X = L? + S? rows, whereas the columns represent time steps. Note into a low-rank matrix L? and a group-sparse matrix that for this data, we expect data corruption (outliers) S?. We investigate under which conditions the pair in the form of short-term usage of electrical devices. (L?, S?) can be obtained as the guaranteed solution to More details can be found in Section 3. Problem (1) with suitably chosen γ. Algebraic varieties and tangent spaces. Low rank for some small matrix ∆. Here, if S? + ∆ 2 and group sparsity are algebraic properties that can S(j gsupp(S?)j) for small ∆, then it must hold ∆ 2 be formalized by algebraic matrix varieties. We need Q(S?). In contrast, L? - ∆ 2 L(rank(L?)) for small ∆ them for our analysis, hence here we briefly introduce only implies that ∆ is contained in a tangent space to them. First, the low-rank matrix variety of matrices the low-rank matrix variety that is close to T (L?). This with rank at most r is given by is due to the local curvature of the low-rank matrix variety. Nevertheless, for local uniqueness of the so- m×n L(r) = fL 2 R : rank(L) 6 rg. lution to Problem (2), it suffices to only consider the tangent spaces T (L?) and Q(S?): As for example shown in Shalit et al. [2010], a rank-r matrix L is a smooth point in L(r) and has tangent Lemma 1. If the tangent spaces T (L?) and Q(S?) are space transverse, that is, if T (L) = UX> + YV> : X 2 Rn×r, Y 2 Rm×r , T (L?) \ Q(S?) = f0g, ? ? where L = UDV> is the (restricted) singular value then the solution (L , S ) to Problem (2) is locally unique.