Partial Trace Regression and Low-Rank Kraus Decomposition
Total Page:16
File Type:pdf, Size:1020Kb
Partial Trace Regression and Low-Rank Kraus Decomposition Hachem Kadri 1 Stéphane Ayache 1 Riikka Huusari 2 Alain Rakotomamonjy 3 4 Liva Ralaivola 4 Abstract The trace regression model, a direct extension of the well-studied linear regression model, al- lows one to map matrices to real-valued outputs. We here introduce an even more general model, namely the partial-trace regression model, a fam- ily of linear mappings from matrix-valued inputs to matrix-valued outputs; this model subsumes the Figure 1. Illustration of the partial trace operation. The partial trace regression model and thus the linear regres- trace operation applied to m × m-blocks of a qm × qm matrix sion model. Borrowing tools from quantum infor- gives a q × q matrix as an output. mation theory, where partial trace operators have been extensively studied, we propose a framework where tr(·) denotes the trace, B∗ is some unknown ma- for learning partial trace regression models from trix of regression coefficients, and is random noise. This data by taking advantage of the so-called low-rank model has been used beyond mere matrix regression for Kraus representation of completely positive maps. problems such as phase retrieval (Candes et al., 2013), quan- We show the relevance of our framework with tum state tomography (Gross et al., 2010), and matrix com- synthetic and real-world experiments conducted pletion (Klopp, 2011). for both i) matrix-to-matrix regression and ii) pos- ` itive semidefinite matrix completion, two tasks Given a sample S = f(Xi; yi)gi=1, where Xi is a p1 × p2 which can be formulated as partial trace regres- matrix and yi 2 R, and each (Xi; yi) is assumed to be sion problems. distributed as (X; y), the training task associated with sta- tistical model (1) is to find a matrix Bb that is a proxy to B∗. To this end, Koltchinskii et al.(2011); Fan et al.(2019) 1. Introduction proposed to compute an estimation Bb of B∗ as the solution of the regularized least squares problem Trace regression model. The trace regression model or, ` in short, trace regression, is a generalization of the well- X 2 B = arg min y − tr B>X + λkBk ; (2) known linear regression model to the case where input data b i i 1 B i=1 are matrices instead of vectors (Rohde & Tsybakov, 2011; Koltchinskii et al., 2011; Slawski et al., 2015), with the where k · k1 is the trace norm (or nuclear norm), which output still being real-valued. This model assumes, for the promotes a low-rank Bb, a key feature for the authors to pair of covariates (X; y), the following relation between establish bounds on the deviation of Bb from B∗. Slawski the matrix-valued random variable X and the real-valued et al.(2015); Koltchinskii & Xia(2015) have considered the : random variable y: particular case where p = p1 = p2 and B∗ is assumed to be + from Sp , the cone of positive semidefinite matrices of order y = tr B>X + , (1) ∗ p, and they showed that guarantees on the deviation of Bb 1Aix-Marseille University, CNRS, LIS, Marseille, France from B∗ continue to hold when Bb is computed as 2 Helsinki Institute for Information Technology HIIT, Department ` 3 of Computer Science, Aalto University, Espoo, Finland Université X 2 4 Bb = arg min (yi − tr (BXi)) : (3) Rouen Normandie, LITIS, Rouen, France Criteo AI Lab, Paris, + France. Correspondence to: Hachem Kadri <hachem.kadri@univ- B2Sp i=1 amu.fr>. Here, the norm regularization of (2) is no longer present Proceedings of the 37 th International Conference on Machine and it is replaced by an explicit restriction for Bb to be in + Learning, Online, PMLR 119, 2020. Copyright 2020 by the au- Sp (as B∗). This setting is tied to the learning of completely thor(s). positive maps developed hereafter. Partial Trace Regression and Low-Rank Kraus Decomposition Partial trace regression model. Here, we propose the Symbol Meaning partial trace regression model, that generalizes the trace i j m n p q regression model to the case when both inputs and outputs , , , , , integers α β γ ::: are matrices, and we go a step farther from works that are , , , real numbers X Y H ::: 1 assuming either matrix-variate inputs (Zhou & Li, 2014; , , , vector spaces x y k ::: Ding & Cook, 2014; Slawski et al., 2015; Luo et al., 2015) , , , vectors (or functions) X Y K ::: or matrix/tensor-variate outputs (Kong et al., 2019; Li & , , , matrices (or operators) X Y K ::: Zhang, 2017; Rabusseau & Kadri, 2016). Key to our work , , , block matrices Φ Λ Γ ::: is the so-called partial trace, explained in the following , , , linear maps on matrices > section and depicted in Figure1. transpose This novel regression model that maps matrix-to-matrix Table 1. Notational conventions used in the paper. is of interest for several application tasks. For instance, in Brain-Computer Interfaces, covariance matrices are fre- quently used as a feature for representing mental state of a learning with the partial trace regression model translates, subject (Barachant et al., 2011; Congedo et al., 2013) and together with a generalization error bound for the associated those covariance matrices need to be transformed (Zanini learned model. In addition, we present how the problem et al., 2018) into other covariance matrices to be discrimi- of (block) positive semidefinite matrix completion can be native for some BCI tasks. Similarly in Computer Vision cast as a partial trace regression problem. and especially in the subfield of 3D Shape retrieval, co- variance matrices are of interest as descriptors (Guo et al., 2.1. Notations, Block Matrices and Partial Trace 2018; Hariri et al., 2017), while there is a surging interest Notations. Our notational conventions are summarized in deep learning methods for defining trainable layers with in Table1. For n; m 2 N, Mn×m = Mn×m(R) denotes covariance matrices as input and output (Huang & Van Gool, the space of all n × m real matrices. If n = m, then we 2017; Brooks et al., 2019). write Mn instead of Mn×n. If M is a matrix, Mij denotes Contributions. We make the following contributions. its (i; j)-th entry. For M 2 Mn, M 0 will be used i) We introduce the partial trace regression model, a family to indicate that M is positive semidefinite (PSD); we may + l of linear predictors from matrix-valued inputs to matrix- equivalently write M 2 Sn . Throughout, f(Xi;Yi)gi=1 valued outputs; this model encompasses previously known denotes a training sample of l examples, with each (Xi;Yi) assumed to be drawn IID from a fixed but unknown dis- regression models, including the trace regression model and : tribution on X × Y where, from now on, X = and thus the linear regression model; ii) borrowing concepts : Mp from quantum information theory, where partial trace opera- Y = Mq. tors have been extensively studied, we propose a framework for learning a partial trace regression model from data; we Block matrices. We will make extensive use of the notion take advantage of the low-rank Kraus representation of com- of block matrices, i.e., matrices that can be partitioned into pletely positive maps to cast learning as an optimization submatrices of the same dimension. If M 2 Mnm, the problem which we are able to handle efficiently; iii) we number of block partitions of M directly depends on the provide statistical guarantees for the model learned under number of factors of n and m; to uniquely identify the our framework, thanks to a provable estimation of pseudodi- partition we are working with, we will always consider a mension of the class of functions that we can envision; iv) n × n block partition, where n will be clear from context finally, we show the relevance of the proposed framework —the number of rows and columns of the matrix at hand will for the tasks of matrix-to-matrix regression and positive thus be multiples of n. The set Mn(Mm) will then denote semidefinite matrix completion, both of them are amenable the space of n × n block matrices M = [[Mij]] whose i; j 2 to a partial trace regression formulation; our empirical re- entry is an element of Mm. sults show that partial trace regression model yields good performance, demonstrating wide applicability and effec- Partial trace operators. Partial trace, extensively studied tiveness. and used in quantum computing (see, e.g,. Rieffel & Polak 2011, Chapter 10), generalizes the trace operation to block 2. Partial Trace Regression matrices. The definition we work with is the following. Here, we introduce the partial trace regression model to en- Definition 1 (Partial trace, see, e.g., Bhatia, 2003.) The code linear mappings from matrix-valued spaces to matrix- partial trace operator, denoted by trm(·), is the linear map valued spaces. We specialize this setting to completely 1 n We also use the standard notations such as R and Mn. positive maps, and show the optimization problem to which 2 The space Mn(Mm) is isomorphic to Mn ⊗ Mm. Partial Trace Regression and Low-Rank Kraus Decomposition from Mq(Mm) to Mq defined by fields of mathematics, physics, and more specifically quan- tum computation and information (Bhatia, 2009; Nielsen & trm(M) = tr(Mij) ; i; j = 1; : : : ; q: Chuang, 2000; Størmer, 2012; Watrous, 2018). Operators from L(Mp; Mq) that have special properties are In other words, given a block matrix M of size qm × qm, the so-called completely positive maps, a family that builds the partial trace is obtained by computing the trace of each upon the notion of positive operators. block of size m × m in the input matrix M, as depicted in Figure1.