A Unified Successive Pseudo-Convex Approximation Framework

1 A Unified Successive Pseudo-Convex Approximation Framework Yang Yang and Marius Pesavento Abstract—In this paper, we propose a successive pseudo-convex because problem (2) can be rewritten into a problem with the approximation algorithm to efficiently compute stationary points form of (1) by the help of auxiliary variables: for a large class of possibly nonconvex optimization problems. The stationary points are obtained by solving a sequence of minimize f(x) + y successively refined approximate problems, each of which is x;y (3) much easier to solve than the original problem. To achieve subject to x 2 X ; g(x) ≤ y: convergence, the approximate problem only needs to exhibit a weak form of convexity, namely, pseudo-convexity. We show that We do not assume that f(x) is convex, so (1) is in general a the proposed framework not only includes as special cases a nonconvex optimization problem. The focus of this paper is on number of existing methods, for example, the gradient method the development of efficient iterative algorithms for computing and the Jacobi algorithm, but also leads to new algorithms which the stationary points of problem (1). The optimization problem enjoy easier implementation and faster convergence speed. We (1) represents general class of optimization problems with a also propose a novel line search method for nondifferentiable optimization problems, which is carried out over a properly vast number of diverse applications. Consider for example constructed differentiable function with the benefit of a simpli- the sum-rate maximization in the MIMO multiple access fied implementation as compared to state-of-the-art line search channel (MAC) [1], the broadcast channel (BC) [2, 3] and the techniques that directly operate on the original nondifferentiable interference channel (IC) [4, 5, 6, 7, 8, 9], where f(x) is the objective function. The advantages of the proposed algorithm sum-rate function of multiple users (to be maximized) while are shown, both theoretically and numerically, by several example applications, namely, MIMO broadcast channel capacity the set X characterizes the users’ power constraints. In the computation, energy efficiency maximization in massive MIMO context of the MIMO IC, (1) is a nonconvex problem and NP- systems and LASSO in sparse signal recovery. hard [5]. As another example, consider portfolio optimization Index Terms—Energy efficiency, exact line search, LASSO, in which f(x) represents the expected return of the portfolio massive MIMO, MIMO broadcast channel, nonconvex optimiza- (to be maximized) and the set X characterizes the trading tion, nondifferentiable optimization, successive convex approxi- constraints [10]. Furthermore, in sparse (l1-regularized) linear mation. regression, f(x) denotes the least square function and g(x) is the sparsity regularization function [11, 12]. Commonly used iterative algorithms belong to the class of I. INTRODUCTION descent direction methods such as the conditional gradient In this paper, we propose an iterative algorithm to solve the method and the gradient projection method for the differen- following general optimization problem: tiable problem (1) [13] and the proximal gradient method for the nondifferentiable problem (2) [14, 15], which often suffer minimize f(x) from slow convergence. To speed up the convergence, the x (1) block coordinate descent (BCD) method that uses the notion of subject to x 2 X ; the nonlinear best-response has been widely studied [13, Sec. where X ⊆ Rn is a closed and convex set, and f(x): Rn ! 2.7]. In particular, this method is applicable if the constraint R is a proper and differentiable function with a continuous set of (1) has a Cartesian product structure X = X1 ×:::×XK gradient. We assume that problem (1) has a solution. such that Problem (1) also includes some class of nondifferentiable minimize f(x1;:::; xK ) x=(x )K optimization problems, if the nondifferentiable function g(x) k k=1 (4) arXiv:1506.04972v2 [math.OC] 7 Apr 2016 is convex: subject to xk 2 Xk; k = 1; : : : ; K: minimize f(x) + g(x) The BCD method is an iterative algorithm: in each iteration, x t+1 (2) only one variable is updated by its best-response xk = subject to x 2 X ; t+1 t+1 t t arg minxk2Xk f(x1 ;:::; xk−1; xk; xk+1;:::; xK ) (i.e., the point that minimizes f(x) with respect to (w.r.t.) the variable Y. Yang is with Intel Deutschland GmbH, Germany (email: xk only while the remaining variables are fixed to their values [email protected]). M. Pesavento is with Communication Systems Group, Darmstadt University of the preceding iteration) and the variables are updated se- of Technology, Germany (email: [email protected]). quentially. This method and its variants have been successfully The authors acknowledge the financial support of the Seventh Framework adopted to many practical problems [1, 6, 7, 10, 16]. Programme for Research of the European Commission under grant number ADEL-619647 and the EXPRESS project within the DFG priority program When the number of variables is large, the convergence CoSIP (DFG-SPP 1798). speed of the BCD method may be slow due to the sequential 2 nature of the update. A parallel variable update based on the 2) The stepsizes can be determined based on the problem best-response seems attractive as a mean to speed up the structure, typically resulting in faster convergence than in cases updating procedure, however, the convergence of a parallel where constant stepsizes [2, 10, 18] and decreasing stepsizes best-response algorithm is only guaranteed under rather re- [8, 19] are used. For example, a constant stepsize can be used strictive conditions, c.f. the diagonal dominance condition on when f(x) is given as the difference of two convex functions the objective function f(x1;:::; xK ) [17], which is not only as in DC programming [21]. When the objective function difficult to satisfy but also hard to verify. If f(x1;:::; xK ) is nondifferentiable, we propose a new exact/successive line is convex, the parallel algorithms converge if the stepsize search method that is carried out over a properly constructed is inversely proportional to the number of block variables differentiable function. Thus it is much easier to implement K. This choice of stepsize, however, tends to be overly than state-of-the-art techniques that operate on the original conservative in systems with a large number of block variables nondifferentiable objective function directly. and inevitably slows down the convergence [2, 10, 18]. In the proposed algorithm, the exact/successive line search A recent progress in parallel algorithms has been made in is used to determine the stepsize and it can be implemented in [8, 9, 19, 20], in which it was shown that the stationary point a centralized controller, whose existence presence is justified of (1) can be found by solving a sequence of successively for particular applications, e.g., the base station in the MIMO refined approximate problems of the original problem (1), and BC, and the portfolio manager in multi-portfolio optimization convergence to a stationary point is established if, among other [10]. We remark that also in applications in which centralized conditions, the approximate function (the objective function controller are not admitted, however, the line search procedure of the approximate problem) and stepsizes are properly se- does not necessarily imply an increased signaling burden when lected. The parallel algorithms proposed in [8, 9, 19, 20] are it is implemented in a distributed manner among different essentially descent direction methods. A description on how distributed processors. For example, in the LASSO problem to construct the approximate problem such that the convexity studied in Sec. IV-C, the stepsize based on the exact line of the original problem is preserved as much as possible is search can be computed in closed-form and it does not incur also contained in [8, 9, 19, 20] to achieve faster convergence any additional signaling as in predetermined stepsizes, e.g., than standard descent directions methods such as classical decreasing stepsizes and constant stepsizes. Besides, even conditional gradient method and gradient projection method. in cases where the line search procedure induces additional Despite its novelty, the parallel algorithms proposed in signaling, the burden is often fully amortized by the significant [8, 9, 19, 20] suffer from two limitations. Firstly, the approx- increase in the convergence rate. imate function must be strongly convex, and this is usually The rest of the paper is organized as follows. In Sec. guaranteed by artificially adding a quadratic regularization II we introduce the mathematical background. The novel term to the original objective function f(x), which how- iterative method is proposed and its convergence is analyzed ever may destroy the desirable characteristic structure of the in Sec. III; its connection to several existing descent direction original problem that could otherwise be exploited, e.g., to algorithms is presented there. In Sec. IV, several applications obtain computationally efficient closed-form solutions of the are considered: the sum rate maximization problem of MIMO approximate problems [6]. Secondly, the algorithms require BC, the energy efficiency maximization of a massive MIMO the use of a decreasing stepsize. On the one hand, a slow system to illustrate the advantage of the proposed approximate decay of the stepsize is preferable to make notable progress function, and the LASSO problem to illustrate the advantage and to achieve satisfactory convergence speed; on the other of the proposed stepsize. The paper is finally concluded in hand, theoretical convergence is guaranteed only when the Sec. V. stepsize decays fast enough. In practice, it is a difficult task Notation: We use x, x and X to denote a scalar, vector on its own to find a decay rate for the stepsize that provides and matrix, respectively.

A Unified Successive Pseudo-Convex Approximation Framework

Lagrangian Duality in Convex Optimization

Arxiv:2011.09194V1 [Math.OC]

Log-Concave Duality in Estimation and Control Arxiv:1607.02522V1

Lecture 11: October 8 11.1 Primal and Dual Problems

Epsilon Solutions and Duality in Vector Optimization

CONJUGATE DUALITY for GENERALIZED CONVEX OPTIMIZATION PROBLEMS Anulekha Dhara and Aparna Mehra

On Strong and Total Lagrange Duality for Convex Optimization Problems

Conjugate Duality and Optimization CBMS-NSF REGIONAL CONFERENCE SERIES in APPLIED MATHEMATICS

Lec 13 - Math Basis for Rate Distortion Optimization

Conjugate Duality

Fenchel Duality Theory and a Primal-Dual Algorithm on Riemannian Manifolds

Duality for Almost Convex Optimization Problems Via the Perturbation Approach