Operator Scaling Via Geodesically Convex Optimization, Invariant Theory and Polynomial Identity Testing∗

Operator Scaling via Geodesically Convex Optimization, Invariant Theory and Polynomial Identity Testing∗ Zeyuan Allen-Zhu Ankit Garg Yuanzhi Li Microsoft Research AI Microsoft Research New England Princeton University [email protected] [email protected] [email protected] Rafael Oliveira Avi Wigderson University of Toronto Institute for Advanced Study [email protected] [email protected] ABSTRACT 1 INTRODUCTION We propose a new second-order method for geodesically convex Group orbits and their closures capture natural notions of equiv- optimization on the natural hyperbolic metric over positive definite alence and are studied in several fields of mathematics like group matrices. We apply it to solve the operator scaling problem in time theory, invariant theory and algebraic geometry. They also come polynomial in the input size and logarithmic in the error. This up naturally in theoretical computer science. For example, graph is an exponential improvement over previous algorithms which isomorphism, the VP vs VNP question and lower bounds on were analyzed in the usual Euclidean, “commutative” metric (for tensor rank are all questions about such notions of equivalence. which the above problem is not convex). Our method is general In this paper, we focus on the orbit-closure intersection problem, and applicable to other settings. which is the most natural way to define equivalence for continuous As a consequence, we solve the equivalence problem for the left- group actions. We explore a general approach to the problem via right group action underlying the operator scaling problem. This geodesically convex optimization. As a testbed for our techniques, yields a deterministic polynomial-time algorithm for a new class of we design a deterministic polynomial-time algorithm for the orbit- Polynomial Identity Testing (PIT) problems, which was the original closure intersection problem for the left-right group action. Recent motivation for studying operator scaling. results by [22, 46] have reduced this problem to polynomial identity testing (PIT), which yields a randomized polynomial-time algorithm. CCS CONCEPTS We derandomize this special case of PIT, perhaps surprisingly, by continuous optimization. • Theory of computation → Convex optimization; Noncon- On the optimization side, we propose a new second-order method vex optimization; Pseudorandomness and derandomization; for geodesically convex optimization and use it to get an algorithm for operator scaling with time polynomial in the input bit size and KEYWORDS poly-logarithmic in 1/ε (ε is the error). In contrast, prior work [32] gives an operator-scaling algorithm that runs in time only polyno- operator scaling, geodesic convex, positive definite, orbit-closure mial in 1/ε, which is not sufficient for an application to the general intersection, capacity orbit-closure intersection problem. ACM Reference Format: On the PIT side, we have continued the line of research initiated Zeyuan Allen-Zhu, Ankit Garg, Yuanzhi Li, Rafael Oliveira, and Avi Wigder- by Mulmuley [59] to the study of problems in algebraic geometry son. 2018. Operator Scaling via Geodesically Convex Optimization, Invariant and invariant theory from an algorithmic perspective in order to Theory and Polynomial Identity Testing. In Proceedings of 50th Annual ACM develop and sharpen tools to attack the PIT problem. Our result SIGACT Symposium on the Theory of Computing (STOC’18). ACM, New York, adds to the growing list of this agenda [29, 32, 45, 46], and continues NY, USA, 10 pages. https://doi.org/10.1145/3188745.3188942 the paper [32] in building optimization tools for PIT problems (at least for those arising from invariant theory). Could it be possible ∗Future version of this paper can be found on arXiv: https://arxiv.org/abs/1804.01076. that the eventual solution to PIT will lie in optimization (perhaps Permission to make digital or hard copies of all or part of this work for personal or very wishful thinking)? classroom use is granted without fee provided that copies are not made or distributed Below is an outline of the rest of the introduction. We review for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the geodesically convex optimization and explain its application to the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or operator-scaling problem in Section 1.1. We discuss the basics of republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. invariant theory in Section 1.2 and an optimization approach to STOC’18, June 25–29, 2018, Los Angeles, CA, USA invariant-theoretic problems in Section 1.2.1. We discuss the left- © 2018 Copyright held by the owner/author(s). Publication rights licensed to the right group action in detail and explain how the orbit-closure inter- Association for Computing Machinery. section problem for this action (for which we give the first deter- ACM ISBN 978-1-4503-5559-9/18/06...$15.00 https://doi.org/10.1145/3188745.3188942 ministic poly-time algorithm) is a special case of PIT in Section 1.3. 172 STOC’18, June 25–29, 2018, Los Angeles, CA, USA Z. Allen-Zhu, A. Garg, Y. Li, R. Oliveira, and A. Wigderson In Section 1.4, we discuss the unitary equivalence problem for the PSD matrices. The so-called capacity of operator T is defined by: left-right action. Section 2 contains an overview of the techniques cap¹T º def= inf det¹T ¹Xºº : we develop. X ≻0;det¹X º=1 The name “operator scaling” comes from the fact that if the infimum 1.1 Geodesically Convex Optimization and X ∗ is attainable, then by defining Y ∗ = T ¹X ∗º−1 and re-scaling ∗ 1/2 ∗ 1/2 Operator Scaling Aei = ¹Y º Ai ¹X º , we have Í y Í y Convex optimization provides the basis for efficient algorithms in i AeiAei = I and i Aei Aei = I : a large number of scientific disciplines. For instance, to findan This is also known as saying that the new operator ε-approximate minimizer, the interior point method runs in time ¹ º def Ím y polynomial in the input and logarithmic in 1/ε. Unfortunately, many Te X = i=1 AeiXAei is doubly stochastic. problems (especially machine-learning ones) cannot be phrased Before our work, the only known algorithmic approach to solve in terms of convex formulations. A body of general-purpose non- the above capacity optimization problem was by Gurvits [37] in convex algorithms have been recently designed with theoretical 2004. His algorithm is a natural extension of Sinkhorn’s algorithm, guarantees (see [3–6, 15, 62, 65]). However, their guarantees are not which was proposed in 1964 [69] for the simpler task of matrix as good as in the convex case: they only converge to approximate scaling. A complete analysis of Gurvits’ algorithm was done in local minima (some only to stationary points) and run in time Garg et al.[32]. Unfortunately, Gurvits’ algorithm (and Sinkhorn’s / polynomial in 1 ε. algorithm too) run in time poly¹n; log M; 1/εº, where M denotes the 2 So one might wonder, for what generalizations of convex opti- largest magnitude of an entry of Ai , and ε is the desired accuracy. mization problems, can one design optimization algorithms with The polynomial dependency on 1/ε is poor and slows down the guarantees comparable to convex optimization? One avenue for downstream applications (such as orbit-closure intersection). such a generalization is given by geodesically convex problems. Remark 1.1. A special case of operator scaling is the matrix scaling Geodesic convexity generalizes Euclidean convexity to Riemann- problem (cf. [7, 18] and references therein). In matrix scaling, we are ian manifolds [14, 36]. While there have been works on devel- given a real matrix with non-negative entries, and asked to re-scale oping algorithms for optimizing geodesically convex functions its rows and columns to make it doubly stochastic. In this very special [1, 63, 72, 76, 78, 79], the theory is still incomplete in terms of what case, one can make a change of variables in the appropriate capacity, is the best computational complexity. and make it convex in the Euclidean metric. This affords standard We focus on geodesically convex optimization over the space of convex optimization techniques, and for this special case, algorithms positive definite (PD) matrices endowed with a different geometry running in time poly¹n; log M; log 1/εº are known [7, 18, 49, 56]. than the Euclidean one. This specific geometry on PD matrices is well studied, see [39, 58, 70, 76]. It is known that for every positive operator T , log¹det¹T ¹Xººº is geodesically convex in X [70]. Also, it is simple to verify that An n × n (complex) matrix M is positive definite (PD) if it is log¹det¹Xºº is geodesically linear (i.e., both convex and concave)3. Hermitian (i.e., M = My) and all of its eigenvalues are strictly Hence, if we define the following alternative objective (removing positive. We write M ≻ 0 to denote that M is PD. The geodesic path the hard constraint on det¹Xº) from any matrix A ≻ 0 to matrix B ≻ 0 is a function γ that maps »0; 1¼ to PD matrices, satisfies γ ¹0º = A and γ ¹1º = B, and is locally logcap¹Xº = log det ¹T ¹Xºº − log det X (1.1) distance minimizing (w.r.t. an appropriate metric). A function F¹Mº then it is geodesically convex over PD matrices X. Note that if ¹ ¹ ºº is geodesically convex iff the univariate function F γ t is convex cap¹T º > 0, then infX ≻0 logcap¹Xº = log¹cap¹T ºº.

Operator Scaling Via Geodesically Convex Optimization, Invariant Theory and Polynomial Identity Testing∗

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support