Arxiv:1803.01621V4 [Eess.SP] 27 Jan 2020
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
UCLA Electronic Theses and Dissertations
UCLA UCLA Electronic Theses and Dissertations Title Projection algorithms for large scale optimization and genomic data analysis Permalink https://escholarship.org/uc/item/95v3t2nk Author Keys, Kevin Lawrence Publication Date 2016 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA Los Angeles Projection algorithms for large scale optimization and genomic data analysis A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Biomathematics by Kevin Lawrence Keys 2016 c Copyright by Kevin Lawrence Keys 2016 ABSTRACT OF THE DISSERTATION Projection algorithms for large scale optimization and genomic data analysis by Kevin Lawrence Keys Doctor of Philosophy in Biomathematics University of California, Los Angeles, 2016 Professor Kenneth L. Lange, Chair The advent of the Big Data era has spawned intense interest in scalable mathematical optimiza- tion methods. Traditional approaches such as Newton’s method fall apart whenever the features outnumber the examples in a data set. Consequently, researchers have intensely developed first- order methods that rely only on gradients and subgradients of a cost function. In this dissertation we focus on projected gradient methods for large-scale constrained opti- mization. We develop a particular case of a proximal gradient method called the proximal distance algorithm. Proximal distance algorithms combine the classical penalty method of constrained min- imization with distance majorization. To optimize the loss function f(x) over a constraint set C, ρ 2 the proximal distance principle mandates minimizing the penalized loss f(x) + 2 dist(x; C) and following the solution xρ to its limit as ρ ! 1. -
Sparse Matrix Linear Models for Structured High-Throughput Data
Submitted to the Annals of Applied Statistics SPARSE MATRIX LINEAR MODELS FOR STRUCTURED HIGH-THROUGHPUT DATA BY JANE W. LIANG *,† Department of Biostatistics Harvard T.H. Chan School of Public Health 655 Huntington Ave, Boston, MA 02115-6028 [email protected] Harvard T.H. Chan School of Public Health † AND BY S´ AUNAK SEN ‡,§ Department of Preventive Medicine University of Tennessee Health Science Center 66 N. Pauline St, Memphis, TN 38163-2181 [email protected] University of Tennessee Health Science Center ‡ University of California, San Francisco § Recent technological advancements have led to the rapid generation of high-throughput biological data, which can be used to address novel scien- tific questions in broad areas of research. These data can be thought of as a large matrix with covariates annotating both rows and columns of this ma- trix. Matrix linear models provide a convenient way for modeling such data. In many situations, sparse estimation of these models is desired. We present fast, general methods for fitting sparse matrix linear models to structured high-throughput data. We induce model sparsity using an L1 penalty and consider the case when the response matrix and the covariate matrices are large. Due to data size, standard methods for estimation of these penalized regression models fail if the problem is converted to the corresponding uni- variate regression scenario. By leveraging matrix properties in the structure of our model, we develop several fast estimation algorithms (coordinate descent, FISTA, and ADMM) and discuss their trade-offs. We evaluate our method’s performance on simulated data, E. coli chemical genetic screening data, and two Arabidopsis genetic datasets with multivariate responses. -
Learning with Stochastic Proximal Gradient
Learning with stochastic proximal gradient Lorenzo Rosasco ∗ DIBRIS, Universita` di Genova Via Dodecaneso, 35 16146 Genova, Italy [email protected] Silvia Villa, Bang˘ Congˆ Vu˜ Laboratory for Computational and Statistical Learning Istituto Italiano di Tecnologia and Massachusetts Institute of Technology Bldg. 46-5155, 77 Massachusetts Avenue, Cambridge, MA 02139, USA fsilvia.villa,[email protected] Abstract We consider composite function minimization problems where only a stochastic estimate of the gradient of the smooth term is available, and in particular regular- ized online learning. We derive novel finite sample bounds for the natural exten- sion of the classical proximal gradient algorithm. Our approach allows to avoid averaging, a feature which is critical when considering sparsity based methods. Moreover, our results match those obtained by the stochastic extension of accel- erated methods, hence suggesting that there is no advantage considering these variants in a stochastic setting. 1 Stochastic proximal gradient algorithm for composite optimization and learning We consider the following general class of minimization problems min L(w) + R(w); (1) w2H under the assumptions • L: H! R is a differentiable convex function with β-Lipshcitz continuous gradient, i.e. for every v and w in H, krL(v) − L(w)k ≤ βkv − wk • R: H! R [ f+1g is a lower semicontinuous convex function (possibly nonsmooth) • Problem 1 has at least one solution. This class of optimization problems arises naturally in regularization schemes where one component is a data fitting term and the other a regularizer, see for example [6, 16]. To solve Problem 1, first order methods have recently been widely applied. -
Arxiv:1912.02039V3 [Math.OC] 2 Oct 2020 P1-1.1-PD-2019-1123, Within PNCDI III
Noname manuscript No. (will be inserted by the editor) Stochastic proximal splitting algorithm for composite minimization Andrei Patrascu · Paul Irofti Received: date / Accepted: date Abstract Supported by the recent contributions in multiple branches, the first-order splitting algorithms became central for structured nonsmooth optimization. In the large-scale or noisy contexts, when only stochastic information on the objective func- tion is available, the extension of proximal gradient schemes to stochastic oracles is heavily based on the tractability of the proximal operator corresponding to nons- mooth component, which has been deeply analyzed in the literature. However, there remained some questions about the difficulty of the composite models where the nonsmooth term is not proximally tractable anymore. Therefore, in this paper we tackle composite optimization problems, where the access only to stochastic infor- mation on both smooth and nonsmooth components is assumed, using a stochastic proximal first-order scheme with stochastic proximal updates. We provide sublinear 1 k convergence rates (in expectation of squared distance to the optimal set) under theO strong convexity assumption on the objective function. Also, linear convergence is achieved for convex feasibility problems. The empirical behavior is illustrated by numerical tests on parametric sparse representation models. 1 Introduction In this paper we consider the following convex composite optimization problem: min f(x) + h(x); x n 2R A. Patrascu and P. Irofti are with the Research Center for Logic, Optimization and Security (LOS), De- partment of Computer Science, Faculty of Mathematics and Computer Science, University of Bucharest. (e-mails: [email protected], [email protected]). The research of A. -
Smoothing Proximal Gradient Method for General Structured Sparse
The Annals of Applied Statistics 2012, Vol. 6, No. 2, 719–752 DOI: 10.1214/11-AOAS514 c Institute of Mathematical Statistics, 2012 SMOOTHING PROXIMAL GRADIENT METHOD FOR GENERAL STRUCTURED SPARSE REGRESSION By Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell and Eric P. Xing1 Carnegie Mellon University We study the problem of estimating high-dimensional regres- sion models regularized by a structured sparsity-inducing penalty that encodes prior structural information on either the input or out- put variables. We consider two widely adopted types of penalties of this kind as motivating examples: (1) the general overlapping- group-lasso penalty, generalized from the group-lasso penalty; and (2) the graph-guided-fused-lasso penalty, generalized from the fused-lasso penalty. For both types of penalties, due to their nonseparability and nonsmoothness, developing an efficient optimization method remains a challenging problem. In this paper we propose a general optimiza- tion approach, the smoothing proximal gradient (SPG) method, which can solve structured sparse regression problems with any smooth con- vex loss under a wide spectrum of structured sparsity-inducing penal- ties. Our approach combines a smoothing technique with an effective proximal gradient method. It achieves a convergence rate significantly faster than the standard first-order methods, subgradient methods, and is much more scalable than the most widely used interior-point methods. The efficiency and scalability of our method are demon- strated on both simulation experiments and real genetic data sets. 1. Introduction. The problem of high-dimensional sparse feature learn- ing arises in many areas in science and engineering. In a typical setting such as linear regression, the input signal leading to a response (i.e., the output) lies in a high-dimensional space, and one is interested in selecting a small arXiv:1005.4717v4 [stat.ML] 29 Jun 2012 number of truly relevant variables in the input that influence the output. -
A Policy-Based Optimization Library
POLO: a POLicy-based Optimization library Arda Aytekin∗1, Martin Biely1, and Mikael Johanssonz1 1 KTH Royal Institute of Technology, School of Electrical Engineering and Computer Science, SE-100 44 Stockholm October 10, 2018 Abstract We present POLO — a C++ library for large-scale parallel optimization research that emphasizes ease-of-use, flexibility and efficiency in algorithm design. It uses multiple inheritance and template programming to de- compose algorithms into essential policies and facilitate code reuse. With its clear separation between algorithm and execution policies, it provides researchers with a simple and powerful platform for prototyping ideas, evaluating them on different parallel computing architectures and hardware platforms, and generating compact and efficient production code. A C-API is included for customization and data loading in high-level languages. POLO enables users to move seamlessly from serial to multi-threaded shared- memory and multi-node distributed-memory executors. We demonstrate how POLO allows users to implement state-of-the-art asynchronous parallel optimization algorithms in just a few lines of code and report experiment results from shared and distributed-memory computing architectures. We provide both POLO and POLO.jl, a wrapper around POLO written in the Julia language, at https://github.com/pologrp under the permissive MIT license. 1 Introduction Wide adoption of Internet of Things (IoT) enabled devices, as well as develop- ments in communication and data storage technologies, have made the collection, transmission, and storage of bulks of data more accessible than ever. Commercial cloud-computing providers, which traditionally offer their available computing resources at data centers to customers, have also started supporting low-power IoT-enabled devices available at the customers’ site in their cloud ecosystem [1]– [3]. -
Proximal Gradient Algorithms: Applications in Signal Processing
Proximal Gradient Algorithms: Applications in Signal Processing Niccoló Antonello, Lorenzo Stella, Panos Patrinos, Toon van Waterschoot [email protected], [email protected], [email protected], [email protected] EUSIPCO 2019 IDIAP, Amazon Research, KU Leuven ESAT-STADIUS Proximal Gradient Algorithms: Applications in Signal Processing Part I: Introduction Toon van Waterschoot [email protected] EUSIPCO 2019 KU Leuven, ESAT-STADIUS Optimization in Signal Processing Inference problems such as ... • Signal estimation, • Parameter estimation, • Signal detection, • Data classification ... naturally lead to optimization problems x? = argmin loss(x) + prior(x) x | {z } cost(x) Variables x could be signal samples, model parameters, algorithm tuning parameters, etc. 1 / 9 Modeling and Inverse Problems Many inference problems lead to optimization problems in which signal models need to be inverted, i.e. inverse problems: Given set of observations y, infer unknown signal or model parameters x • Inverse problems are often ill-conditioned or underdetermined • Large-scale problems may suffer more easily from ill-conditioning • Including prior in cost function then becomes crucial (e.g. regularization) 2 / 9 Modeling and Inverse Problems Choice of suitable cost function often depends on adoption of application-specific signal model, e.g. • Dictionary model, e.g. sum of sinusoids y = Dx with DFT matrix D • Filter model, e.g. linear FIR filter y = Hx with convolution matrix H • Black-box model, e.g. -
A Unified Formulation and Fast Accelerated Proximal Gradient Method for Classification
Journal of Machine Learning Research 18 (2017) 1-49 Submitted 5/16; Revised 1/17; Published 3/17 A Unified Formulation and Fast Accelerated Proximal Gradient Method for Classification Naoki Ito naoki [email protected] Department of Mathematical Informatics The University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan Akiko Takeda [email protected] Department of Mathematical Analysis and Statistical Inference The Institute of Statistical Mathematics 10-3 Midori-cho, Tachikawa, Tokyo 190-8562 Japan Kim-Chuan Toh [email protected] Department of Mathematics National University of Singapore Blk S17, 10 Lower Kent Ridge Road, Singapore 119076, Singapore Editor: Moritz Hardt Abstract Binary classification is the problem of predicting the class a given sample belongs to. To achieve a good prediction performance, it is important to find a suitable model for a given dataset. However, it is often time consuming and impractical for practitioners to try various classification models because each model employs a different formulation and algorithm. The difficulty can be mitigated if we have a unified formulation and an efficient universal algorithmic framework for various classification models to expedite the comparison of performance of different models for a given dataset. In this paper, we present a unified formulation of various classification models (including C-SVM, `2-SVM, ν-SVM, MM-FDA, MM-MPM, logistic regression, distance weighted discrimination) and develop a general optimization algorithm based on an accelerated proximal gradient (APG) method for the formulation. We design various techniques such as backtracking line search and adaptive restarting strategy in order to speed up the practical convergence of our method. -
A Proximal Stochastic Gradient Method with Progressive Variance Reduction∗
SIAM J. OPTIM. c 2014 Society for Industrial and Applied Mathematics Vol. 24, No. 4, pp. 2057–2075 A PROXIMAL STOCHASTIC GRADIENT METHOD WITH PROGRESSIVE VARIANCE REDUCTION∗ † ‡ LIN XIAO AND TONG ZHANG Abstract. We consider the problem of minimizing the sum of two convex functions: one is the average of a large number of smooth component functions, and the other is a general convex function that admits a simple proximal mapping. We assume the whole objective function is strongly convex. Such problems often arise in machine learning, known as regularized empirical risk minimization. We propose and analyze a new proximal stochastic gradient method, which uses a multistage scheme to progressively reduce the variance of the stochastic gradient. While each iteration of this algorithm has similar cost as the classical stochastic gradient method (or incremental gradient method), we show that the expected objective value converges to the optimum at a geometric rate. The overall complexity of this method is much lower than both the proximal full gradient method and the standard proximal stochastic gradient method. Key words. stochastic gradient method, proximal mapping, variance reduction AMS subject classifications. 65C60, 65Y20, 90C25 DOI. 10.1137/140961791 1. Introduction. We consider the problem of minimizing the sum of two convex functions: (1.1) minimize {P (x) def= F (x)+R(x)}, x∈Rd where F (x) is the average of many smooth component functions fi(x), i.e., n 1 (1.2) F (x)= fi(x), n i=1 and R(x) is relatively simple but can be nondifferentiable. We are especially interested in the case where the number of components n is very large, and it can be advantageous to use incremental methods (such as the stochastic gradient method) that operate on a single component fi at each iteration, rather than on the entire cost function.