Introduction to Optimization

Computational Optimization Overview Computational Optimization for Economists Sven Leyffer, Jorge More,´ and Todd Munson Mathematics and Computer Division Argonne National Laboratory 1. Introducion to Optimization [Moré] {leyffer,more,tmunson}@mcs.anl.gov 2. Continuous Optimization in AMPL [Munson] 3. Optimization Software [Leyffer] 4. Complementarity & Games [Munson] Institute for Computational Economics University of Chicago July 30 – August 9, 2007 Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Outline: Six Topics Introduction Unconstrained optimization Part I • Limited-memory variable metric methods Systems of Nonlinear Equations Introduction, Applications, and Formulations • Sparsity and Newton’s method Automatic Differentiation • Computing sparse Jacobians via graph coloring Constrained Optimization • All that you need to know about KKT conditions Solving optimization problems • Modeling languages: AMPL and GAMS • NEOS Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Topic 1: The Optimization Viewpoint View of Optimization from Applications Modeling Algorithms Software Automatic differentiation tools Application-specific languages 0.12 High-performance architectures 0.1 0.08 0.06 0.04 0.02 0 1 0.8 1 0.6 0.8 0.4 0.6 0.4 0.2 0.2 0 0 Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Classification of Constrained Optimization Problems Classification of Constrained Optimization Software • Formulation min {f(x): xl ≤ x ≤ xu, cl ≤ c(x) ≤ cu} • Interfaces: MATLAB, AMPL, GAMS • Second-order information options: • Differences • Number of variables n • Limited memory • Number of constraints m • Hessian-vector products • Number of linear constraints • Linear solvers • Direct solvers • Number of equality constraints n e • Iterative solvers • Number of degrees of freedom n − ne • Preconditioners 0 • Sparsity of c (x) = (∂icj(x)) • Partially separable problem formulation 2 2 Pm 2 • Sparsity of ∇xL(x, λ) = ∇ f(x) + k=1 ∇ ck(x)λk • Documentation • License Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Life-Cycles Saving Problem Constrained Optimization Software: IPOPT Maximize the utility • T Formulation X t β u(ct) min {f(x): xl ≤ x ≤ xu, c(x) = 0} t=1 where St are the saving, ct is consumption, wt are wages, and • Interfaces: AMPL • Second-order information options: S = (1 + r)S + w − c , 0 ≤ t < T t+1 t t+1 t+1 • Differences • Limited memory with r = 0.2 interest rate, β = 0.9, S = S = 0, and 0 T • Hessian-vector products u(c) = − exp(−c) • Direct solvers: MA27, MA57 • Partially separable problem formulation: None Assume that w = 1 for t < R and w = 0 for t ≥ R. t t • Documentation Question. What are the characteristics of the life-cycle problem? • License Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Life-Cycles Saving Problem: Results Topic 2: Unconstrained Optimization (R, T ) = (30, 50) (R, T ) = (60, 100) Augustin Louis Cauchy (August 21, 1789 – May 23, 1857) Question. Problem formulation to results: How long? Additional information at Mac Tutor www-history.mcs.st-andrews.ac.uk Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Unconstrained Optimization: Background Ginzburg-Landau Model n Given a continuously differentiable f : R 7→ R and Minimize the Gibbs free energy for a homogeneous superconductor n min {f(x): x ∈ R } Z 2 1 4 2 2 2 −|v(x)| + 2 |v(x)| + k[∇ − iA(x)] v(x)k + κ k(∇ × A)(x)k dx generate a sequence of iterates {xk} such that the gradient test D k∇f(xk)k ≤ τ 2 is eventually satisfied v : R → C (order parameter) 2 2 A : R → R (vector potential) n Theorem. If f : R 7→ R is continuously differentiable and bounded below, then there is a sequence {xk} such that Unconstrained problem. Non-convex function. Hessian is singular. lim k∇f(xk)k = 0. k→∞ Unique minimizer, but there is a saddle point. Exercise. Prove this result. Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Unconstrained Optimization Computing the Gradient Hand-coded gradients What can I use if the gradient ∇f(x) is not available? Generally efficient Error prone Geometry-based methods: Pattern search, Nelder-Mead, . The cost is usually less than 5 function evaluations Model-based methods: Quadratic, radial-basis models, . Difference approximations What can I use if the gradient ∇f(x) is available? Conjugate gradient methods f((x + hei) − f(x) ∂if(x) ≈ Limited-memory variable metric methods hi Variable metric methods Choice of hi may be problematic in the presence of noise. Costs n function evaluations 1/2 Accuracy is about the εf where εf is the noise level of f Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Cheap Gradient via Automatic Differentiation Line Search Methods Code generated by automatic differentiation tools A sequence of iterates {xk} is generated via Accurate to full precision xk+1 = xk + αkpk, For the reverse mode the cost is ΩT T {f(x)}. In theory, ΩT ≤ 5. where pk is a descent direction at xk, that is, For the reverse mode the memory is proportional to the number of ∇f(x )T p < 0, intermediate variables. k k and αk is determined by a line search along pk. Exercise Line searches Develop an order n code for computing the gradient of Geometry-based: Armijo, . n Y Model-based: Quadratics, cubic models, . f(x) = xk k=1 Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Powell-Wolfe Conditions on the Line Search Conjugate Gradient Algorithms Given 0 ≤ µ < η ≤ 1, require that Given a starting vector x0 generate iterates via T f(x + αp) ≤ f(x) + µ α∇f(xk) pk sufficent decrease xk+1 = xk + αkpk |∇f(x + αp)T p| ≤ η |∇f(x)T p| curvature condition pk+1 = −∇f(xk) + βkpk where αk is determined by a line search. Three reasonable choices of βk are (gk = ∇f(xk)): 2 FR kgk+1k βk = , Fletcher-Reeves kgkk PR hgk+1, gk+1 − gki βk = 2 , Polak-Rivière kgkk PR+ PR βk = max βk , 0 , PR-plus Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Limited-Memory Variable-Metric Algorithms Recommendations Given a starting vector x0 generate iterates via But what algorithm should I use? xk+1 = xk − αkHk∇f(xk) If the gradient ∇f(x) is not available, then a model-based method is a reasonable choice. Methods based on quadratic interpolation where αk is determined by a line search. are currently the best choice. The matrix Hk is defined in terms of information gathered during the If the gradient ∇f(x) is available, then a limited-memory variable previous m iterations. metric method is likely to produce an approximate minimizer in the least number of gradient evaluations. H is positive definite. k If the Hessian is also available, then a state-of-the-art Storage of Hk requires 2mn locations. implementation of Newton’s method is likely to produce the best Computation of Hk∇f(xk) costs (8m + 1)n flops. results if the problem is large and sparse. Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Topic 3: Newton’s Method Motivation n n Give a continuously differentiable f : R 7→ R , solve f1(x) . f(x) = . = 0 fn(x) Linear models. The mapping defined by 0 Lk(s) = f(xk) + f (xk)s Sir Isaac Newton (January 4, 1643 – March 331, 1727) is a linear model of f near x , and thus it is sensible to choose s such Additional information at Mac Tutor k k that L (s ) = 0 provided x + s is near x . www-history.mcs.st-andrews.ac.uk k k k k k Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Newton’s Method Flow in a Channel Problem Given a starting point x0, Newton’s method generates iterates via Analyze the flow of a fluid during injection into a long vertical channel, assuming that the flow is modeled by the boundary value problem below, f 0(x )s = −f(x ), x = x + s . k k k k+1 k k where u is the potential function and R is the Reynolds number. Computational Issues How do we solve for sk? How do we handle a (nearly) singular f 0(x )? k u0000 = R (u0u00 − uu000) How do we enforce convergence if x is not near a solution? 0 u(0) = 0, u(1) = 1 0 How do we compute/approximate f (xk)? u0(0) = u0(1) = 0 How accurately do we solve for sk? Is the algorithm scale invariant? Is the algorithm mesh-invariant? Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Sparsity Topic 4: Automatic Differentiation Assume that the Jacobian matrix is sparse, and let ρi be the number of non-zeroes in the i-th row of f 0(x). 0 Sparse linear solvers can solve f (x)s = −f(x) in order ρA operations, where ρ = avg{ρ2}. A i Gottfried Wilhelm Leibniz (July 1, 1646 – November 14, 1716) Graph coloring techniques (see Topic 4) can compute or Additional information at Mac Tutor approximate the Jacobian matrix with ρM function evaluations www-history.mcs.st-andrews.ac.uk where ρM = max{ρi} Leyffer, Moré, and Munson Computational Optimization Leyffer, Moré, and Munson Computational Optimization Computing Gradients and Sparse Jacobians Structurally Orthogonal Columns n m Theorem. Given f : R 7→ R , automatic differentiation tools compute Structurally orthogonal columns do not have a nonzero in the same 0 f (x)v at a cost comparable to f(x) row position.

Introduction to Optimization

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support