Introduction to Optimization

Computational Optimization Overview Computational Optimization for Economists Sven Leyffer, Jorge More,´ and Todd Munson Mathematics and Computer Division Argonne National Laboratory 1. Introducion to Optimization [Moré] {leyffer,more,tmunson}@mcs.anl.gov 2. Continuous Optimization in AMPL [Munson] 3. Optimization Software [Leyffer] 4. Complementarity & Games [Munson] Institute for Computational Economics University of Chicago July 30 – August 9, 2007 Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Outline: Six Topics Introduction Unconstrained optimization Part I • Limited-memory variable metric methods Systems of Nonlinear Equations Introduction, Applications, and Formulations • Sparsity and Newton’s method Automatic Differentiation • Computing sparse Jacobians via graph coloring Constrained Optimization • All that you need to know about KKT conditions Solving optimization problems • Modeling languages: AMPL and GAMS • NEOS Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Topic 1: The Optimization Viewpoint View of Optimization from Applications Modeling Algorithms Software Automatic differentiation tools Application-specific languages 0.12 High-performance architectures 0.1 0.08 0.06 0.04 0.02 0 1 0.8 1 0.6 0.8 0.4 0.6 0.4 0.2 0.2 0 0 Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Classification of Constrained Optimization Problems Classification of Constrained Optimization Software • Formulation min {f(x): xl ≤ x ≤ xu, cl ≤ c(x) ≤ cu} • Interfaces: MATLAB, AMPL, GAMS • Second-order information options: • Differences • Number of variables n • Limited memory • Number of constraints m • Hessian-vector products • Number of linear constraints • Linear solvers • Direct solvers • Number of equality constraints n e • Iterative solvers • Number of degrees of freedom n − ne • Preconditioners 0 • Sparsity of c (x) = (∂icj(x)) • Partially separable problem formulation 2 2 Pm 2 • Sparsity of ∇xL(x, λ) = ∇ f(x) + k=1 ∇ ck(x)λk • Documentation • License Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Life-Cycles Saving Problem Constrained Optimization Software: IPOPT Maximize the utility • T Formulation X t β u(ct) min {f(x): xl ≤ x ≤ xu, c(x) = 0} t=1 where St are the saving, ct is consumption, wt are wages, and • Interfaces: AMPL • Second-order information options: S = (1 + r)S + w − c , 0 ≤ t < T t+1 t t+1 t+1 • Differences • Limited memory with r = 0.2 interest rate, β = 0.9, S = S = 0, and 0 T • Hessian-vector products u(c) = − exp(−c) • Direct solvers: MA27, MA57 • Partially separable problem formulation: None Assume that w = 1 for t < R and w = 0 for t ≥ R. t t • Documentation Question. What are the characteristics of the life-cycle problem? • License Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Life-Cycles Saving Problem: Results Topic 2: Unconstrained Optimization (R, T ) = (30, 50) (R, T ) = (60, 100) Augustin Louis Cauchy (August 21, 1789 – May 23, 1857) Question. Problem formulation to results: How long? Additional information at Mac Tutor www-history.mcs.st-andrews.ac.uk Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Unconstrained Optimization: Background Ginzburg-Landau Model n Given a continuously differentiable f : R 7→ R and Minimize the Gibbs free energy for a homogeneous superconductor n min {f(x): x ∈ R } Z 2 1 4 2 2 2 −|v(x)| + 2 |v(x)| + k[∇ − iA(x)] v(x)k + κ k(∇ × A)(x)k dx generate a sequence of iterates {xk} such that the gradient test D k∇f(xk)k ≤ τ 2 is eventually satisfied v : R → C (order parameter) 2 2 A : R → R (vector potential) n Theorem. If f : R 7→ R is continuously differentiable and bounded below, then there is a sequence {xk} such that Unconstrained problem. Non-convex function. Hessian is singular. lim k∇f(xk)k = 0. k→∞ Unique minimizer, but there is a saddle point. Exercise. Prove this result. Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Unconstrained Optimization Computing the Gradient Hand-coded gradients What can I use if the gradient ∇f(x) is not available? Generally efficient Error prone Geometry-based methods: Pattern search, Nelder-Mead, . The cost is usually less than 5 function evaluations Model-based methods: Quadratic, radial-basis models, . Difference approximations What can I use if the gradient ∇f(x) is available? Conjugate gradient methods f((x + hei) − f(x) ∂if(x) ≈ Limited-memory variable metric methods hi Variable metric methods Choice of hi may be problematic in the presence of noise. Costs n function evaluations 1/2 Accuracy is about the εf where εf is the noise level of f Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Cheap Gradient via Automatic Differentiation Line Search Methods Code generated by automatic differentiation tools A sequence of iterates {xk} is generated via Accurate to full precision xk+1 = xk + αkpk, For the reverse mode the cost is ΩT T {f(x)}. In theory, ΩT ≤ 5. where pk is a descent direction at xk, that is, For the reverse mode the memory is proportional to the number of ∇f(x )T p < 0, intermediate variables. k k and αk is determined by a line search along pk. Exercise Line searches Develop an order n code for computing the gradient of Geometry-based: Armijo, . n Y Model-based: Quadratics, cubic models, . f(x) = xk k=1 Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Powell-Wolfe Conditions on the Line Search Conjugate Gradient Algorithms Given 0 ≤ µ < η ≤ 1, require that Given a starting vector x0 generate iterates via T f(x + αp) ≤ f(x) + µ α∇f(xk) pk sufficent decrease xk+1 = xk + αkpk |∇f(x + αp)T p| ≤ η |∇f(x)T p| curvature condition pk+1 = −∇f(xk) + βkpk where αk is determined by a line search. Three reasonable choices of βk are (gk = ∇f(xk)): 2 FR kgk+1k βk = , Fletcher-Reeves kgkk PR hgk+1, gk+1 − gki βk = 2 , Polak-Rivière kgkk PR+ PR βk = max βk , 0 , PR-plus Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Limited-Memory Variable-Metric Algorithms Recommendations Given a starting vector x0 generate iterates via But what algorithm should I use? xk+1 = xk − αkHk∇f(xk) If the gradient ∇f(x) is not available, then a model-based method is a reasonable choice. Methods based on quadratic interpolation where αk is determined by a line search. are currently the best choice. The matrix Hk is defined in terms of information gathered during the If the gradient ∇f(x) is available, then a limited-memory variable previous m iterations. metric method is likely to produce an approximate minimizer in the least number of gradient evaluations. H is positive definite. k If the Hessian is also available, then a state-of-the-art Storage of Hk requires 2mn locations. implementation of Newton’s method is likely to produce the best Computation of Hk∇f(xk) costs (8m + 1)n flops. results if the problem is large and sparse. Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Topic 3: Newton’s Method Motivation n n Give a continuously differentiable f : R 7→ R , solve f1(x) . f(x) = . = 0 fn(x) Linear models. The mapping defined by 0 Lk(s) = f(xk) + f (xk)s Sir Isaac Newton (January 4, 1643 – March 331, 1727) is a linear model of f near x , and thus it is sensible to choose s such Additional information at Mac Tutor k k that L (s ) = 0 provided x + s is near x . www-history.mcs.st-andrews.ac.uk k k k k k Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Newton’s Method Flow in a Channel Problem Given a starting point x0, Newton’s method generates iterates via Analyze the flow of a fluid during injection into a long vertical channel, assuming that the flow is modeled by the boundary value problem below, f 0(x )s = −f(x ), x = x + s . k k k k+1 k k where u is the potential function and R is the Reynolds number. Computational Issues How do we solve for sk? How do we handle a (nearly) singular f 0(x )? k u0000 = R (u0u00 − uu000) How do we enforce convergence if x is not near a solution? 0 u(0) = 0, u(1) = 1 0 How do we compute/approximate f (xk)? u0(0) = u0(1) = 0 How accurately do we solve for sk? Is the algorithm scale invariant? Is the algorithm mesh-invariant? Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Sparsity Topic 4: Automatic Differentiation Assume that the Jacobian matrix is sparse, and let ρi be the number of non-zeroes in the i-th row of f 0(x). 0 Sparse linear solvers can solve f (x)s = −f(x) in order ρA operations, where ρ = avg{ρ2}. A i Gottfried Wilhelm Leibniz (July 1, 1646 – November 14, 1716) Graph coloring techniques (see Topic 4) can compute or Additional information at Mac Tutor approximate the Jacobian matrix with ρM function evaluations www-history.mcs.st-andrews.ac.uk where ρM = max{ρi} Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Computing Gradients and Sparse Jacobians Structurally Orthogonal Columns n m Theorem. Given f : R 7→ R , automatic differentiation tools compute Structurally orthogonal columns do not have a nonzero in the same 0 f (x)v at a cost comparable to f(x) row position.

Load more