Trust Region Newton Method for Large-Scale Logistic Regression
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Advances in Interior Point Methods and Column Generation
Advances in Interior Point Methods and Column Generation Pablo Gonz´alezBrevis Doctor of Philosophy University of Edinburgh 2013 Declaration I declare that this thesis was composed by myself and that the work contained therein is my own, except where explicitly stated otherwise in the text. (Pablo Gonz´alezBrevis) ii To my endless inspiration and angels on earth: Paulina, Crist´obal and Esteban iii Abstract In this thesis we study how to efficiently combine the column generation technique (CG) and interior point methods (IPMs) for solving the relaxation of a selection of integer programming problems. In order to obtain an efficient method a change in the column generation technique and a new reoptimization strategy for a primal-dual interior point method are proposed. It is well-known that the standard column generation technique suffers from un- stable behaviour due to the use of optimal dual solutions that are extreme points of the restricted master problem (RMP). This unstable behaviour slows down column generation so variations of the standard technique which rely on interior points of the dual feasible set of the RMP have been proposed in the literature. Among these tech- niques, there is the primal-dual column generation method (PDCGM) which relies on sub-optimal and well-centred dual solutions. This technique dynamically adjusts the column generation tolerance as the method approaches optimality. Also, it relies on the notion of the symmetric neighbourhood of the central path so sub-optimal and well-centred solutions are obtained. We provide a thorough theoretical analysis that guarantees the convergence of the primal-dual approach even though sub-optimal solu- tions are used in the course of the algorithm. -
CNC/IUGG: 2019 Quadrennial Report
CNC/IUGG: 2019 Quadrennial Report Geodesy and Geophysics in Canada 2015-2019 Quadrennial Report of the Canadian National Committee for the International Union of Geodesy and Geophysics Prepared on the Occasion of the 27th General Assembly of the IUGG Montreal, Canada July 2019 INTRODUCTION This report summarizes the research carried out in Canada in the fields of geodesy and geophysics during the quadrennial 2015-2019. It was prepared under the direction of the Canadian National Committee for the International Union of Geodesy and Geophysics (CNC/IUGG). The CNC/IUGG is administered by the Canadian Geophysical Union, in consultation with the Canadian Meteorological and Oceanographic Society and other Canadian scientific organizations, including the Canadian Association of Physicists, the Geological Association of Canada, and the Canadian Institute of Geomatics. The IUGG adhering organization for Canada is the National Research Council of Canada. Among other duties, the CNC/IUGG is responsible for: • collecting and reconciling the many views of the constituent Canadian scientific community on relevant issues • identifying, representing, and promoting the capabilities and distinctive competence of the community on the international stage • enhancing the depth and breadth of the participation of the community in the activities and events of the IUGG and related organizations • establishing the mechanisms for communicating to the community the views of the IUGG and information about the activities of the IUGG. The aim of this report is to communicate to both the Canadian and international scientific communities the research areas and research progress that has been achieved in geodesy and geophysics over the last four years. The main body of this report is divided into eight sections: one for each of the eight major scientific disciplines as represented by the eight sister societies of the IUGG. -
The Method of Gauss-Newton to Compute Power Series Solutions of Polynomial Homotopies∗
The Method of Gauss-Newton to Compute Power Series Solutions of Polynomial Homotopies∗ Nathan Bliss Jan Verschelde University of Illinois at Chicago Department of Mathematics, Statistics, and Computer Science 851 S. Morgan Street (m/c 249), Chicago, IL 60607-7045, USA fnbliss2,[email protected] Abstract We consider the extension of the method of Gauss-Newton from com- plex floating-point arithmetic to the field of truncated power series with complex floating-point coefficients. With linearization we formulate a lin- ear system where the coefficient matrix is a series with matrix coefficients, and provide a characterization for when the matrix series is regular based on the algebraic variety of an augmented system. The structure of the lin- ear system leads to a block triangular system. In the regular case, solving the linear system is equivalent to solving a Hermite interpolation problem. In general, we solve a Hermite-Laurent interpolation problem, via a lower triangular echelon form on the coefficient matrix. We show that this solu- tion has cost cubic in the problem size. With a few illustrative examples, we demonstrate the application to polynomial homotopy continuation. Key words and phrases. Linearization, Gauss-Newton, Hermite inter- polation, polynomial homotopy, power series. 1 Introduction 1.1 Preliminaries A polynomial homotopy is a family of polynomial systems which depend on one parameter. Numerical continuation methods to track solution paths defined by a homotopy are classical, see e.g.: [3] and [23]. Our goal is to improve the algorithms to track solution paths in two ways: ∗This material is based upon work supported by the National Science Foundation under Grant No. -
A New Trust Region Method with Simple Model for Large-Scale Optimization ∗
A New Trust Region Method with Simple Model for Large-Scale Optimization ∗ Qunyan Zhou,y Wenyu Sunz and Hongchao Zhangx Abstract In this paper a new trust region method with simple model for solv- ing large-scale unconstrained nonlinear optimization is proposed. By employing generalized weak quasi-Newton equations, we derive several schemes to construct variants of scalar matrices as the Hessian approx- imation used in the trust region subproblem. Under some reasonable conditions, global convergence of the proposed algorithm is established in the trust region framework. The numerical experiments on solving the test problems with dimensions from 50 to 20000 in the CUTEr li- brary are reported to show efficiency of the algorithm. Key words. unconstrained optimization, Barzilai-Borwein method, weak quasi-Newton equation, trust region method, global convergence AMS(2010) 65K05, 90C30 1. Introduction In this paper, we consider the unconstrained optimization problem min f(x); (1.1) x2Rn where f : Rn ! R is continuously differentiable with dimension n relatively large. Trust region methods are a class of powerful and robust globalization ∗This work was supported by National Natural Science Foundation of China (11171159, 11401308), the Natural Science Fund for Universities of Jiangsu Province (13KJB110007), the Fund of Jiangsu University of Technology (KYY13012). ySchool of Mathematics and Physics, Jiangsu University of Technology, Changzhou 213001, Jiangsu, China. Email: [email protected] zCorresponding author. School of Mathematical Sciences, Jiangsu Key Laboratory for NSLSCS, Nanjing Normal University, Nanjing 210023, China. Email: [email protected] xDepartment of Mathematics, Louisiana State University, USA. Email: [email protected] 1 methods for solving (1.1). -
Approximate Newton's Method for Tomographic Image Reconstruction
Approximate Newton's method for Tomographic Image Reconstruction Yao Xie Reports for EE 391 Fall 2007 { 2008 December 14, 2007 Abstract In this report we solved a regularized maximum likelihood (ML) image reconstruction problem (with Poisson noise). We considered the gradient descent method, exact Newton's method, pre-conditioned conjugate gradient (CG) method, and an approximate Newton's method, with which we can solve a million variable problem. However, the current version of approximate Newton's method converges slow. We could possibly improve by ¯nding a better approximate to the Hessian matrix, and still easy to compute its inverse. 1 Introduction Deterministic Filtered backprojection (FBP) and regularized Maximum-Likelihood(ML) es- timation are two major approaches to transmission tomography image reconstruction, such as CT [KS88]. Though the latter method typically yields higher image quality, its practical application is yet limited by its high computational cost (0.02 second per image for FBP versus several hours per image for ML-based method). This study works towards solving large scale ML imaging problem. We consider several versions of the Newton's algorithm, including an approximate Newton's method. We tested our methods on one simulated example and one real data set. Our method works, but converges relatively slow. One possible remedy is to ¯nd a better approximation to the Hessian matrix. 2 Problem Formulation We consider a computed tomographic (CT) system. The object to be imaged is a square region, which we divide into an n £n array of square pixels. The pixels are indexded column 1 ¯rst, by a single index i ranging from 1 to n2. -
On the Use of Neural Networks to Solve Differential Equations Alberto García Molina
Máster Universitario en Modelización e Investigación Matemática, Estadística y Computación 2019/2020 Trabajo Fin de Máster On the use of Neural Networks to solve Differential Equations Alberto García Molina Tutor/es Carlos Gorria Corres Lugar y fecha de presentación prevista 12 de Octubre del 2020 Abstract English. Artificial neural networks are parametric models, generally adjusted to solve regression and classification problem. For a long time, a question has laid around regarding the possibility of using these types of models to approximate the solutions of initial and boundary value problems, as a means for numerical integration. Recent improvements in deep-learning have made this approach much attainable, and integration methods based on training (fitting) artificial neural networks have begin to spring, motivated mostly by their mesh-free natureand scalability to high dimensions. In this work, we go all the way from the most basic elements, such as the definition of artificial neural networks and well-posedness of the problems, to solving several linear and quasi-linear PDEs using this approach. Throughout this work we explain general theory concerning artificial neural networks, including topics such as vanishing gradients, non-convex optimization or regularization, and we adapt them to better suite the initial and boundary value problems nature. Some of the original contributions in this work include: an analysis of the vanishing gradient problem with respect to the input derivatives, a custom regularization technique based on the network’s parameters derivatives, and a method to rescale the subgradients of the multi-objective of the loss function used to optimize the network. Spanish. Las redes neuronales son modelos paramétricos generalmente usados para resolver problemas de regresiones y clasificación. -
Constrained Multifidelity Optimization Using Model Calibration
Struct Multidisc Optim (2012) 46:93–109 DOI 10.1007/s00158-011-0749-1 RESEARCH PAPER Constrained multifidelity optimization using model calibration Andrew March · Karen Willcox Received: 4 April 2011 / Revised: 22 November 2011 / Accepted: 28 November 2011 / Published online: 8 January 2012 c Springer-Verlag 2012 Abstract Multifidelity optimization approaches seek to derivative-free and sequential quadratic programming meth- bring higher-fidelity analyses earlier into the design process ods. The method uses approximately the same number of by using performance estimates from lower-fidelity models high-fidelity analyses as a multifidelity trust-region algo- to accelerate convergence towards the optimum of a high- rithm that estimates the high-fidelity gradient using finite fidelity design problem. Current multifidelity optimization differences. methods generally fall into two broad categories: provably convergent methods that use either the high-fidelity gradient Keywords Multifidelity · Derivative-free · Optimization · or a high-fidelity pattern-search, and heuristic model cali- Multidisciplinary · Aerodynamic design bration approaches, such as interpolating high-fidelity data or adding a Kriging error model to a lower-fidelity function. This paper presents a multifidelity optimization method that bridges these two ideas; our method iteratively calibrates Nomenclature lower-fidelity information to the high-fidelity function in order to find an optimum of the high-fidelity design prob- A Active and violated constraint Jacobian lem. The -
Computing a Trust Region Step
Distribution Category: Mathematics and Computers ANL--81-83 (UC-32) DE82 005656 ANIr81483 ARGONNE NATIONAL LABORATORY 9700 South Cass Avenue Argonne, Illinois 60439 COMPUTING A TRUST REGION STEP Jorge J. Mord and D. C. Sorensen Applied Mathematics Division DISCLAIMER - -. ".,.,... December 1981 Computing a Trust Region Step Jorge J. Mord and D. C. Sorensen Applied Mathematics Division Argonne National Laboratory Argonne Illinois 60439 1. Introduction. In an important class of minimization algorithms called "trust region methods" (see, for example, Sorensen [1981]), the calculation of the step between iterates requires the solution of a problem of the form (1.1) minij(w): 1|w |isAl where A is a positive parameter, I- II is the Euclidean norm in R", and (1.2) #(w) = g w + Mw7Bw, with g E R", and B E R a symmetric matrix. The quadratic function ' generally represents a local model to the objective function defined by interpolatory data at an iterate and thus it is important to be able to solve (1.1) for any symmetric matrix B; in particular, for a matrix B with negative eigenvalues. In trust region methods it is sometimes helpful to include a scaling matrix for the variables. In this case, problem (1.1) is replaced by (1.3) mint%(v):IIDu Is A where D E R""" is a nonsingular matrix. The change of variables Du = w shows that problem (1.3) is equivalent to (1.4) minif(w):1|w| Is Al where f(w) _%f(D-'w), and that the solutions of problems (1.3) and (1.4) are related by Dv = w. -
Welcome to the 2013 MOPTA Conference!
Welcome to the 2013 MOPTA Conference! Mission Statement The Modeling and Optimization: Theory and Applications (MOPTA) conference is an annual event aiming to bring together a diverse group of people from both discrete and continuous optimization, working on both theoretical and applied aspects. The format consists of invited talks from distinguished speakers and selected contributed talks, spread over three days. The goal is to present a diverse set of exciting new developments from different optimization areas while at the same time providing a setting that will allow increased interaction among the participants. We aim to bring together researchers from both the theoretical and applied communities who do not usually have the chance to interact in the framework of a medium- scale event. MOPTA 2013 is hosted by the Department of Industrial and Systems Engineering at Lehigh University. Organization Committee Frank E. Curtis - Chair [email protected] Tamás Terlaky [email protected] Ted Ralphs [email protected] Katya Scheinberg [email protected] Lawrence V. Snyder [email protected] Robert H. Storer [email protected] Aurélie Thiele [email protected] Luis Zuluaga [email protected] Staff Kathy Rambo We thank our sponsors! 1 Program Wednesday, August 14 – Rauch Business Center 7:30-8:10 - Registration and continental breakfast - Perella Auditorium Lobby 8:10-8:20 - Welcome: Tamás Terlaky, Department Chair, Lehigh ISE - Perella Auditorium (RBC 184) 8:20-8:30 - Opening remarks: Patrick Farrell, Provost, Lehigh University -
Tutorial: PART 2
Tutorial: PART 2 Optimization for Machine Learning Elad Hazan Princeton University + help from Sanjeev Arora & Yoram Singer Agenda 1. Learning as mathematical optimization • Stochastic optimization, ERM, online regret minimization • Online gradient descent 2. Regularization • AdaGrad and optimal regularization 3. Gradient Descent++ • Frank-Wolfe, acceleration, variance reduction, second order methods, non-convex optimization Accelerating gradient descent? 1. Adaptive regularization (AdaGrad) works for non-smooth&non-convex 2. Variance reduction uses special ERM structure very effective for smooth&convex 3. Acceleration/momentum smooth convex only, general purpose optimization since 80’s Condition number of convex functions # defined as � = , where (simplified) $ 0 ≺ �� ≼ �+� � ≼ �� � = strong convexity, � = smoothness Non-convex smooth functions: (simplified) −�� ≼ �+� � ≼ �� Why do we care? well-conditioned functions exhibit much faster optimization! (but equivalent via reductions) Examples Smooth gradient descent The descent lemma, �-smooth functions: (algorithm: �012 = �0 − ��4 ) + � �012 − � �0 ≤ −�0 �012 − �0 + � �0 − �012 1 = − � + ��+ � + = − � + 0 4� 0 Thus, for M-bounded functions: ( � �0 ≤ �) 1 −2� ≤ � � − �(� ) = > �(� ) − � � ≤ − > � + ; 2 012 0 4� 0 0 0 Thus, exists a t for which, 8�� � + ≤ 0 � Smooth gradient descent 2 Conclusions: for � = � − �� and T = Ω , finds 012 0 4 C + �0 ≤ � 1. Holds even for non-convex functions ∗ 2. For convex functions implies � �0 − � � ≤ �(�) (faster for smooth!) Non-convex stochastic gradient descent The descent lemma, �-smooth functions: (algorithm: �012 = �0 − ��N0) + � � �012 − � �0 ≤ � −�4 �012 − �0 + � �0 − �012 + + + + = � −�P0 ⋅ ��4 + � �N4 = −��4 + � �� �N4 + + + = −��4 + � �(�4 + ��� �P0 ) Thus, for M-bounded functions: ( � �0 ≤ �) M� T = � ⇒ ∃ . � + ≤ � �+ 0Y; 0 Controlling the variance: Interpolating GD and SGD Model: both full and stochastic gradients. Estimator combines both into lower variance RV: �012 = �0 − � �P� �0 − �P� �[ + ��(�[) Every so oFten, compute Full gradient and restart at new �[. -
An Active–Set Trust-Region Method for Bound-Constrained Nonlinear Optimization Without Derivatives Applied to Noisy Aerodynamic Design Problems Anke Troltzsch
An active–set trust-region method for bound-constrained nonlinear optimization without derivatives applied to noisy aerodynamic design problems Anke Troltzsch To cite this version: Anke Troltzsch. An active–set trust-region method for bound-constrained nonlinear optimization with- out derivatives applied to noisy aerodynamic design problems. Optimization and Control [math.OC]. Institut National Polytechnique de Toulouse - INPT, 2011. English. tel-00639257 HAL Id: tel-00639257 https://tel.archives-ouvertes.fr/tel-00639257 Submitted on 8 Nov 2011 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. %0$503"5%&%0$503"5%&-6/*7&34*5²-6/*7&34*5²%&506-064&%&506-064& Institut National Polytechnique de Toulouse (INP Toulouse) Mathématiques Informatique Télécommunications Anke TRÖLTZSCH mardi 7 juin 2011 An active-set trust-region method for bound-constrained nonlinear optimization without derivatives applied to noisy aerodynamic design problems Mathématiques Informatique Télécommunications (MITT) CERFACS -
Petsc's Software Strategy for the Design Space of Composable
PETSc's Software Strategy for the Design Space of Composable Extreme-Scale SolversI Barry Smitha, Lois Curfman McInnesa, Emil Constantinescua, Mark Adamsb, Satish Balaya, Jed Browna, Matthew Knepleyc, Hong Zhangd aMathematics and Computer Science Division, Argonne National Laboratory bApplied Physics and Applied Mathematics Department, Columbia University cComputation Institute, University of Chicago dDepartment of Computer Science, Illinois Institute of Technology Abstract Emerging extreme-scale architectures present new opportunities for broader scope of simulations as well as new challenges in algorithms and software to exploit unprecedented levels of parallelism. Composable, hierarchical solver algorithms and carefully designed portable software are crucial to the success of extreme- scale simulations, because solver phases often dominate overall simulation time. This paper presents the PETSc design philogophy and recent advances in the library that enable application scientists to investi- gate the design space of composable linear, nonlinear, and timestepping solvers. In particular, separation of the control logic of the algorithms from the computational kernels of the solvers is crucial to allow in- jecting new hardware-specific computational kernels without having to rewrite the entire solver software library. Progress in this direction has already begun in PETSc, with new support for pthreads, OpenMP, and GPUs as a first step toward hardware-independent, high-level control logic for computational kernels. This multipronged software strategy for composable extreme-scale solvers will help exploit unprecedented extreme-scale computational power in two important ways: by facilitating the injection of newly developed scalable algorithms and data structures into fundamental components, and by providing the underlying foundation for a paradigm shift that raises the level of abstraction from simulation of complex systems to the design and uncertainty quantification of these systems.