The TAO Linearly-Constrained Augmented Lagrangian Method for PDE-Constrained Optimization

Todd Munson, Evan Gawlik, Jason Sarich, and Stefan Wild

Mathematics and Computer Science Division Argonne National Laboratory

Tools for Structured Optimization Problems on High-Performance Architectures: Lagrangians and Complementarity

October 18, 2011 Outline

1 Motivation

2

3 Numerical Results

2 / 17 Electrical Resistivity

Perform a set of experiments by Driving direct current signal into the ground Measuring the resulting potentials created Inverse problem to determine resistivity

N Z 1 X 2 1 2 min kQi ui − di k + α (v − v¯) dΩ u,v 2 2 i=1 Ω v subject to −∇ · (e ∇ui ) = qi x ∈ Ω i = 1,..., N ∇ui · n = 0 x ∈ ∂Ω i = 1,..., N R Ω ui dΩ = 0 i = 1,..., N

Discretize the operators Solve finite-dimensional approximation Problems available in the Haber-Hanson test collection

3 / 17 Generic Formulation

Solve the optimization problem

min f (u, v) u,v subject to g(u, v) = 0

u are state variables, v are design variables λ are Lagrange multipliers on the constraint Assume ∇ug(u, v) is invertible Develop efficient methods with modest requirements Evaluate objective and constraint residual Evaluate gradient and Jacobian vector products T Linear solves with ∇ug(u, v) and ∇ug(u, v) Reuse iterative methods and Use a small number of linear solves per major iteration

4 / 17 Linearly-Constrained Augmented Lagrangian Method

1 Given uk , vk , and λk 2 Choose ρk and approximately solve

T ρk 2 min m(u, v) ≡ f (u, v) − g(u, v) λk + kg(u, v)k u,v 2 subject to Ak (u − uk ) + Bk (v − vk ) + gk = 0

gk ≡ g(uk , vk ) Ak ≡ ∇ug(uk , vk ) and Bk ≡ ∇v g(uk , vk )

3 Calculate λk+1

5 / 17 Newton Step Toward Feasibility Solve the linear system

Ak du = −gk

T If no solution or gk Ak du ≥ 0 then enter feasibility restoration Choose ρk so that du is descent direction for merit function

 T T  du (Ak λk − ∇uf (uk , vk )) ρk ≥ max ρk−1, T gk Ak du Determine the step length

αk ∈ arg min mk (uk + αdu, vk ) α≥0

Accept the step 0 uk = uk + αk du 0 vk = vk

6 / 17 Modified Reduced-Space Step Toward Optimality

Start with constrained optimization problem

min mk (u, v) u,v subject to Ak (u − uk ) + Bk (v − vk ) + αk gk = 0

Form reduced-space problem

−1 min mk (uk − A (Bk (v − vk ) + αk gk ), v) v k Perform a change of variables and simplify

0 −1 0 min mk (uk − Ak Bk dv , vk + dv ) dv Apply a limited-memory quasi-Newton method

7 / 17 Limited-Memory Quasi-Newton Method

1 Solve the quadratic approximation

1 ˜ T min dv Hk,i dv +g ˜k,i dv dv 2

g˜k,i is the reduced gradient

i i T −T i i g˜k,i = ∇v mk (uk , vk ) − Bk Ak ∇umk (uk , vk )

H˜k,i is positive-definite reduced Hessian approximation Based on properties of H˜k,i we can easily compute

˜ −1 dv = −Hk,i g˜k,i

2 Perform to determine step length βi 3 Update variables and Hessian approximation and repeat

8 / 17 Full-Space Line Search

Avoid computing reduced-gradient in line search Recover full-space direction

−1 du = −Ak Bk dv Linear constraint satisfied along full-space direction Calculate the step length

i i min mk (uk + βdu, vk + βdv ) β≥0

Update i+1 i uk = uk + βk du i+1 i vk = vk + βk dv

9 / 17 Algorithm Overview

1 0 0 Determine uk and vk (one forward solve) 2 For i = 1,... 1 Compute reduced gradient (one adjoint solve)

i i T −T i i g˜k,i = ∇v mk (uk , vk ) − Bk Ak ∇umk (uk , vk )

2 Update Hessian approximation using BFGS formula 3 Calculate the direction ˜ −1 dv = −Hk,i g˜k,i

4 Recover full space direction (one forward solve)

−1 du = −Ak Bk dv

5 Find the step length βk with a line search i+1 i i+1 i 6 Update uk = uk + βk du and vk = vk + βk dv 3 Accept the new iteration Compute reduced gradient at final point (one adjoint solve) Update Hessian approximation −T i i Set λk+1 = Ak ∇umk (uk , vk )

10 / 17 Computational Cost

One forward solve per major iteration for Newton direction One adjoint solve per minor iteration for reduced gradient One forward solve per minor iteration for full step One adjoint solve per major iteration for Lagrange multipliers

Hessian approximation carried over between major iterations

11 / 17 TAO Implementation

Method implemented in Toolkit for Advanced Optimization Employs PETSc for linear solvers and preconditioners User provides code to evaluate needed functions Available in upcoming TAO 2.0 release Some computational details Overall convergence tolerance of 10−4 Linear solver performed to relative tolerance of 10−4 Limited-memory quasi-Newton approximation BFGS approximation with a rank-5 matrix Scaling matrix updated using Broyden updates Mor´e-Thuenteline search Based on cubic interpolation Guarantee that iterates satisfy

12 / 17 Test Problems

Haber-Hanson test problems Elliptic – electrical resistivity Hyperbolic – optical tomography Parabolic – mass transfer Discretizations and iterative methods given Conjugate for symmetric matrices Generalized minimum residual for non-symmetric matrices Jacobi and successive over relaxation preconditioners Problem Dimension Time-dependent # State # Design Total (n) 3 3 3 Elliptic 3 No mx me mx mx (me + 1) 3 3 3 Parabolic 3 Yes mx mt mx mx (mt + 1) 2 2 2 Hyperbolic 2 Yes mx mt 2mx mt 3mx mt

13 / 17 Sensitivity to Linear Solver Accuracy

Elliptic 1.3 Parabolic 1.2 Hyperbolic 1.1 1 0.9 Relative time 0.8 0.7

−10 −5 0 10 10 10 Relative residual tolerance

14 / 17 Algorithm Scaling

3 10 Elliptic Parabolic Hyperbolic

2 10 # of outer iterations

1 10 2 4 6 8 10 10 10 10 Problem size (n)

15 / 17 Strong Scaling of Elliptic Problem (mx = 96)

103 Wall time (s)

102

101 101 102 102 102 103 Number of cores

16 / 17 Conclusion

Method is effective on the test problems Requires minimal information from the user No need for the Hessian of Lagrangian Use same for forward and adjoint swaps sides in adjoint solves Cost dominated by (parallel) forward and adjoint solves Preliminary results indicate good strong scaling Available in upcoming TAO 2.0 release http://www.mcs.anl.gov/tao

17 / 17