Positive Semidefinite Matrix Factorization Based on Truncated

Positive Semidefinite Matrix Factorization Based on Truncated Wirtinger Flow Dana Lahat, Cedric´ Fevotte´ ∗IRIT, Universite´ de Toulouse, CNRS, Toulouse, France fDana.Lahat,[email protected] Abstract—This paper deals with algorithms for positive geometry. Since it was first introduced, PSDMF attracted a semidefinite matrix factorization (PSDMF). PSDMF is a recently- significant amount of attention and has found applications in proposed extension of nonnegative matrix factorization with numerous fields, including combinatorial optimization [2], [6], applications in combinatorial optimization, among others. In this paper, we focus on improving the local convergence of an quantum communication complexity [2], quantum informa- alternating block gradient (ABG) method for PSDMF in a noise- tion theory [7], and recommender systems [8]. Despite the free setting by replacing the quadratic objective function with the increasing interest in PSDMF, only a few algorithms have Poisson log-likelihood. This idea is based on truncated Wirtinger been proposed so far, e.g., [3], [8], [9]. In this paper, we do flow (TWF), a phase retrieval (PR) method that trims outliers in not focus on a specific application but on understanding some the gradient and thus regularizes it. Our motivation is a recent result linking PR with PSDMF. Our numerical experiments of the optimization properties of PSDMF, in a more general validate that the numerical benefits of TWF may carry over perspective. to PSDMF despite the more challenging setting, when initialized Our work is motivated by a recent result [10] showing a within its region of convergence. We then extend TWF from PR link between PSDMF and affine rank minimization (ARM) to affine rank minimization (ARM), and show that although the (e.g., [11], [12]), and in particular with phase retrieval (PR) outliers are no longer an issue in the ARM setting, PSDMF with the new objective function may still achieves a smaller error for (e.g., [13]), which can be considered as a special case of ARM. the same number of iterations. In a broader view, our results We explain their link to PSDMF briefly in Section II, which indicate that a proper choice of objective function may enhance is dedicated to background material. convergence of matrix (or tensor) factorization methods. In [10], we presented a new type of local optimization Index Terms—Positive semidefinite factorization, nonnegative scheme for PSDMF that is based on alternating block gradient factorizations, phase retrieval, low-rank approximations, optimization (ABG). In this ABG scheme, each subproblem is based on Wirtinger flow (WF) [13] or ARM [11], [12], depending on I. INTRODUCTION the values of the inner ranks, and the objective function is based on quadratic loss. This paper deals with improving the local convergence This paper is motivated by truncated Wirtinger flow of algorithms for positive semidefinite matrix factorization (TWF) [14], [15], a variant of WF obtained by replacing the (PSDMF) [1], [2]. In PSDMF, the (i; j)th entry x of a ij quadratic objective function with a non-quadratic one, together nonnegative matrix X 2 I×J is written as an inner product R with special regularization based on truncating excessive of two K × K symmetric positive semidefinite (psd) matrices values from the gradient and the objective function. These A and B , indexed by i = [I], j = [J]: i j excessive values do not appear in the quadratic objective. TWF ∼ T improves the convergence rate also in the noise-free case, and xij = hAi; Bji , trfAi Bjg = trfAiBjg (1) requires appropriate initial conditions. In this paper, starting where [I] , 1;:::;I, tr{·} denotes the trace of a matrix, from Section III, we build an ABG algorithm in which each and =∼ means equal or approximately equal, depending on subproblem is based on the principles of TWF, and study to the context. The minimal number K such that a nonnegative which extent the ideas of TWF can be applied to PSDMF. matrix X admits an exact PSDMF is called the psd rank of Our main result is that in the case where all inner ranks X [1]. Each psd matrix Ai and Bj may have a different are equal to 1, a setting equivalent to locally minimizing a rank, denoted RAi and RBj , respectively. We shall sometimes TWF criterion in each subproblem, we observe phenomena refer to the psd rank K as the “outer rank” and to RAi and analogous to those in TWF as predicted by our geometrical RBj as “inner ranks” [3]. When Ai and Bj are constrained analysis in Section IV, provided we are in the basin of attrac- to be diagonal matrices 8i; j, PSDMF becomes equivalent to tion of a global minimum. This special case of PSDMF, known nonnegative matrix factorization (NMF) (e.g., [4]). as Hadamard square root or phaseless rank, is of particular PSDMF was proposed as an extension to a fundamental interest (e.g., [3], [16], [17]). When some of the inner ranks result [5] in combinatorial optimization that links NMF with are larger than one, there is no need for truncation, as we show in Section IV. Yet, we still observe faster convergence to the This work has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme global minimum with our non-quadratic objective, a matter under grant agreement No. 681839 (project FACTORY). than needs to be further looked into. 978-9-0827-9705-3 1035 EUSIPCO 2020 Although we focus on the Poisson log-likelihood, as in [15], Similarly, based on (3), an MLE to PSDMF can be found by we keep in mind that the same principles are likely to minimizing, w.r.t. Ui and Vj, 8i; j, the objective function hold also for other objective functions; see further discussion I J in Section VI. To the best of our knowledge, this paper is the X X T 2 d(xij; kUi VjkF ) : (7) first to suggest that changing the objective function within the i=1 j=1 same algorithmic framework in a matrix factorization problem can result in improved numerical properties, in the noise-free Looking at (6) and (7), we see how any ARM method for (6) fU gI case, provided proper initialization and regularization (e.g., can be used to optimize (7) : given i i=1, apply ARM V truncation) when necessary. subsequently to update each j in (7). Then change roles: fix fV gJ and apply the method to update U subsequently In this paper, we use real-valued notations. However, all our j j=1 i for each i. Repeat until a stopping criterion is achieved. The results extend straightforwardly to the complex-valued case. alternating optimization strategy for PSDMF has already been proposed, e.g., in [3], [8], [9], [19] . The link between these II. A LINK BETWEEN PSDMF AND PHASE RETRIEVAL methods and PR is pointed out in [10]. The ABG method In PSDMF applications, the psd matrices are often of low in [10] is based on decreasing the objective function in (6) rank, i.e., RAi ;RBj < K (e.g., [17]). Hence, the psd matrices by gradient descent with a quadratic objective function, as can be written as in [11], [12]. Addressing the PR problem associated with (5) T T via gradient descent with a quadratic objective function is Ai , UiUi and Bj , VjVj ; (2) termed WF [13]. K×RA K×RB where Ui 2 R i and Vj 2 R j are full-rank factor III. EXTENDING TWF TO ARM matrices (factors, for short). With this change of variables, the Starting from this section, we gradually build an ABG model in (1) can be written as method for PSDMF based on TWF [15]. For the sake of ∼ T T T T T 2 simplicity, we focus on the Poisson log-likelihood, as in [15]. xij = hUiU ; VjV i = trfUiU VjV g = kU Vjk :(3) i j i j i F In this case, the objective function in (7) is based on the ζ In the special case where J = 1, (3) can be written as Kullback-Leibler divergence (KLD): dKL(ζ; η) , ζ log( η ) − ζ + η. PSDMF with a Poisson log-likelihood thus consists of ∼ T 2 yi = kUi VkF ; (4) decreasing the objective function where we replaced Vj with V and xij with yi. When Ui I J X X T 2 T 2 are given, finding V from yi, i = [I], in (4) is equivalent to f = kUi VjkF − xij log(kUi VjkF ) ARM (e.g., [11], [12]). In this case, Ui can be regarded as i=1 j=1 sensing matrices and V as the signal of interest. If all factors + xij log(xij) − xij (8) are vectors: v 2 K and u 2 K 8i, (4) can be written as R i R K×RB whose gradient w.r.t. Vj 2 C j is ∼ T T T 2 2 yi = huiui ; vv i = jui vj = jhui; vij ; i = [I] : (5) I T 2 X kUi VjkF − xij T rVj f = 2 T 2 UiUi Vj : (9) When all vectors ui are given, the problem of estimating kU V k i=1 i j F v from the phaseless observations yi in (5) is known as ·T ·H r f (generalized) PR [18]. In a PR context, ui are sometimes In the complex-valued case, change to and divide Vj referred to as sensing vectors and v as a signal of interest. by 2. For a specific j, (8) and (9) are the low-rank general- We conclude that PR (5) and ARM (4) are special cases of izations of the Poisson objective and gradient in TWF [15].

Positive Semidefinite Matrix Factorization Based on Truncated

Final Exam Guide Guide

A Line-Search Descent Algorithm for Strict Saddle Functions with Complexity Guarantees∗

Newton Method for Stochastic Control Problems Emmanuel Gobet, Maxime Grangereau

A Field Guide to Forward-Backward Splitting with a Fasta Implementation

Finding Critical and Gradient-Flat Points of Deep Neural Network Loss Functions

Arxiv:2108.10249V1 [Math.OC] 23 Aug 2021 Adepoints

CS260: Machine Learning Algorithms Lecture 3: Optimization

Composing Scalable Nonlinear Algebraic Solvers

Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

1 Gradient-Based Optimization

A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning

Line Search Algorithms