Nonlinear Programming: Concepts, Algorithms and Applications

L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA Nonlinear Programming and Process Optimization

Introduction

Unconstrained Optimization • Algorithms • Newton Methods • Quasi-Newton Methods

Constrained Optimization • Karush Kuhn-Tucker Conditions • Special Classes of Optimization Problems • Reduced Methods (GRG2, CONOPT, MINOS) • Successive (SQP) • Interior Point Methods

Process Optimization • Black Box Optimization • Modular Flowsheet Optimization – Infeasible Path • The Role of Exact Derivatives

Large-Scale Nonlinear Programming • Data Reconciliation • Real-time Process Optimization

Further Applications • Sensitivity Analysis for NLP Solutions • Multiperiod Optimization Problems

Summary and Conclusions 2 Introduction

Optimization: given a system or process, find the best solution to this process within constraints. Objective : indicator of "goodness" of solution, e.g., cost, yield, profit, etc. Decision Variables: variables that influence process behavior and can be adjusted for optimization.

In many cases, this task is done by trial and error (through case study). Here, we are interested in a systematic approach to this task - and to make this task as efficient as possible.

Some related areas: - Math programming - Operations Research Currently - Over 30 journals devoted to optimization with roughly 200 papers/month - a fast moving field!

3 Optimization Viewpoints

Mathematician - characterization of theoretical properties of optimization, convergence, existence, local convergence rates. Numerical Analyst - implementation of optimization method for efficient and "practical" use. Concerned with ease of computations, numerical stability, performance. Engineer - applies optimization method to real problems. Concerned with reliability, robustness, efficiency, diagnosis, and recovery from failure.

4 Optimization Literature

Engineering 1. Edgar, T.F., D.M. Himmelblau, and L. S. Lasdon, Optimization of Chemical Processes, McGraw-Hill, 2001. 2. Papalambros, P. and D. Wilde, Principles of Optimal Design. Cambridge Press, 1988. 3. Reklaitis, G., A. Ravindran, and K. Ragsdell, Engineering Optimization, Wiley, 1983. 4. Biegler, L. T., I. E. Grossmann and A. Westerberg, Systematic Methods of Chemical Process Design, Prentice Hall, 1997.

Numerical Analysis 1. Dennis, J.E. and R. Schnabel, Numerical Methods of Unconstrained Optimization, Prentice-Hall, (1983), SIAM (1995) 2. Fletcher, R. Practical Methods of Optimization, Wiley, 1987. 3. Gill, P.E, W. Murray and M. Wright, Practical Optimization, Academic Press, 1981. 4. Nocedal, J. and S. Wright, Numerical Optimization, Springer, 1998

5 Motivation

Scope of optimization Provide systematic framework for searching among a specified space of alternatives to identify an “optimal” design, i.e., as a decision-making tool

Premise Conceptual formulation of optimal product and process design corresponds to a mathematical programming problem

min f (x, y) st h(x, y) = 0 g(x, y)≤ 0 x∈ R n y ∈ ,0{ }1 ny

6 Optimization in Design, Operations and Control

MILP MINLP Global LP,QP NLP SA/GA

HENS x x x x x x MENS x x x x x x Separations x x Reactors x x x x Equipment x x x Design Flowsheeting x x Scheduling x x x x Supply Chain x x x Real-time x x optimization Linear MPC x Nonlinear x x MPC Hybrid x x

7 Unconstrained Multivariable Optimization

Problem: Min f(x) (n variables)

Equivalent to: Max -f(x), x ∈ Rn

Nonsmooth Functions - Direct Search Methods - Statistical/Random Methods

Smooth Functions - 1st Order Methods - Newton Type Methods - Conjugate

8 Example: Optimal Vessel Dimensions What is the optimal L/D ratio for a cylindrical vessel? Constrained Problem  π D2  (1)  π  Min CT + CS DL = cost  2  πDL2 s.t. V - = 0 4 Convert to Unconstrained (Eliminate L)  π D2 4V  Min  CT + CS = cost  2 D 

d(cost) 4VCs = CTπ D - = 0 (2) dD D2

1/ 3   1/3   2/3  4V   4V    CS    C T D = L =         π CT  π   CS 

==> L/D = CT/CS Note: - What if L cannot be eliminated in (1) explicitly? (strange shape) - What if D cannot be extracted from (2)? (cost correlation implicit) 9 Two Dimensional Contours of F(x)

Convex Function Nonconvex Function Multimodal, Nonconvex

Discontinuous Nondifferentiable (convex)

10 Local vs. Global Solutions

•Find a local minimum point x* for f(x) for defined by functions: f(x*) ” f(x) for all x satisfying the constraints in some neighborhood around x* (not for all x ∈ X) •Finding and verifying global solutions will not be considered here. •Requires a more expensive search (e.g. spatial ). •A local solution to the NLP is also a global solution under the following sufficient conditions based on convexity. • f(x) is convex in domain X, if and only if it satisfies: f(α y + (1-α) z) ” α f(y) + (1-α)f(z) for any α, 0 ” α ” 1, at all points y and z in X.

11 Linear Algebra - Background

Some Definitions • Scalars - Greek letters, α, β, γ • Vectors - Roman Letters, lower case • Matrices - Roman Letters, upper case • Matrix Multiplication: ∈ ℜn x m ∈ ℜm x p ∈ ℜn x p Σ C = A B if A , B and C , Cij = k Aik Bkj • Transpose - if A ∈ ℜn x m, interchange rows and columns --> AT∈ ℜm x n • Symmetric Matrix - A ∈ ℜn x n (square matrix) and A = AT • Identity Matrix - I, square matrix with ones on diagonal and zeroes elsewhere. • Determinant: "Inverse Volume" measure of a square matrix Σ det(A) = i (-1)i+j Aij Aij for any j, or Σ det(A) = j (-1)i+j Aij Aij for any i, where Aij is the determinant of an order n-1 matrix with row i and column j removed. det(I) = 1

• Singular Matrix: det (A) = 0 12 Linear Algebra - Background

Gradient Vector -(∇f(x))  ∂  f / ∂x1  ∂  f / ∂x 2 ∇f =   ......   ∂  f / ∂x n

Hessian Matrix (∇2f(x) - Symmetric)

 ∂ 2 f ∂ 2 f ∂ 2 f  ⋅ ⋅ ⋅ ⋅  2  ∂x1 ∂x1∂x2 ∂x1 ∂xn ∇2 f(x) =  ......    ∂ 2 f ∂ 2 f ∂ 2 f  ⋅ ⋅ ⋅ ⋅  2  ∂xn ∂x1 ∂xn ∂x2 ∂xn 

∂ 2 f ∂2f Note: = ∂ ∂ ∂ ∂ xi x j xj xi

13 Linear Algebra - Background • Some Identities for Determinant det(A B) = det(A) det(B); det (A) = det(AT) α αn Π λ det( A) = det(A); det(A) = i i(A)

• Eigenvalues: det(A- λ I) = 0, Eigenvector: Av = λ v Characteristic values and directions of a matrix. For nonsymmetric matrices eigenvalues can be complex, so we often use singular values, σ = λ(ATΑ)1/2 ≥ 0 • Vector Norms Σ p 1/p || x ||p = { i |xi| } ∞ (most common are p = 1, p = 2 (Euclidean) and p = (max norm = maxi|xi|)) • Matrix Norms ||A|| = max ||A x||/||x|| over x (for p-norms) Σ ||A||1 - max column sum of A, maxj ( i |Aij|)

∞ Σ ||A|| - maximum row sum of A, maxi ( j |Aij|) ||A||2 = [σmax(Α)](spectral radius)

Σ Σ 2]1/2 ||A||F = [ i j (Aij) (Frobenius norm) -1 κ(Α) = ||A|| ||A || (condition number) = σmax/σmin (using 2-norm) 14 Linear Algebra - Eigenvalues λ λ Find v and where Avi = i vi, i = i,n Note: Av - λv = (A - λI) v = 0 or det (A - λI) = 0 For this relation λ is an eigenvalue and v is an eigenvector of A.

λ If A is symmetric, all i are real λ i > 0, i = 1, n; A is positive definite λ i < 0, i = 1, n; A is negative definite λ i = 0, some i: A is singular Quadratic Form can be expressed in Canonical Form (Eigenvalue/Eigenvector) xTAx ⇒ A V = V Λ V - eigenvector matrix (n x n) Λ λ - eigenvalue (diagonal) matrix = diag( i)

λ -1 T If A is symmetric, all i are real and V can be chosen orthonormal (V = V ). Thus, A = V Λ V-1 = V Λ VT For Quadratic Function: Q(x) = aTx + ½ xTAx Define: z = VTx and Q(Vz) = (aTV) z + ½ zT (VTAV)z = (aTV) z + ½ zT Λ z

λ -1 Λ-1 T Minimum occurs at (if i > 0) x = -A a or x = Vz = -V( V a) 15 Positive (or Negative) Curvature Positive (or Negative) Definite Hessian

Both eigenvalues are strictly positive or negative • A is positive definite or negative definite • Stationary points are minima or maxima

16 Zero Curvature Singular Hessian

One eigenvalue is zero, the other is strictly positive or negative • A is positive semidefinite or negative semidefinite • There is a ridge of stationary points (minima or maxima)

17 Indefinite Curvature Indefinite Hessian

One eigenvalue is positive, the other is negative • Stationary point is a saddle point • A is indefinite

Note: these can also be viewed as two dimensional projections for higher dimensional problems

18 Eigenvalue Example

T 1  1 2 1 Min Q(x) =   x + xT   x 1  2 1 2 2 1 AV = VΛ with A =   1 2   1 0 1/ 2 1/ 2 V T AV = Λ =   with V =   0 3 -1/ 2 1/ 2

• All eigenvalues are positive • Minimum occurs at z* = -/-1VTa

    (x − x )/ 2 (x + x ) / 2 z = V T x =  1 2  x = Vz =  1 2  + − + ( x1 x2)/ 2 ( x1 x2 )/ 2  0  1/3  z* =   x* =   2/(3 2) 1/3 

19 Comparison of Optimization Methods

1. Convergence Theory • Global Convergence - will it converge to a local optimum (or stationary point) from a poor starting point? • Local Convergence Rate - how fast will it converge close to the solution?

2. Benchmarks on Large Class of Test Problems Representative Problem (Hughes, 1981)

α β Min f(x1, x2) = exp(- ) u = x1 - 0.8 2 1/2 v = x2 - (a1 + a2 u (1- u) - a3 u) α 2 1/2 = -b1 + b2 u (1+u) + b3 u β = 2 2 c1 v (1 - c2 v)/(1+ c3 u )

a = [ 0.3, 0.6, 0.2] b = [5, 26, 3] c = [40, 1, 10] x* = [0.7395, 0.3144] f(x*) = 5.0893

20 Three Dimensional Surface and Curvature for Representative Test Problem

Regions where minimum eigenvalue is less than: [0, -10, -50, -100, -150, -200]

21 What conditions characterize an optimal solution?

x2 Unconstrained Local Minimum Necessary Conditions ∇f (x*) = 0 pT∇2f (x*) p • 0 for p∈ℜn (positive semi-definite)

x* Unconstrained Local Minimum Sufficient Conditions ∇f (x*) = 0 T∇2 ∈ℜn Contours of f(x) p f (x*) p > 0 for p (positive definite)

x1

For smooth functions, why are contours around optimum elliptical? Taylor Series in n dimensions about x*: 1 f (x) = f (x*) + ∇f (x*)T (x − x*) + (x − x*)T ∇2 f (x*)(x − x*) 2 Since ∇f(x*) = 0, f(x) is purely quadratic for x close to x* 22 Newton’s Method

Taylor Series for f(x) about xk

Take derivative wrt x, set LHS § 0

0 §∇f(x) = ∇f(xk) + ∇2f(xk) (x - xk) ⇒ (x - xk) ≡ d = - (∇2f(xk))-1 ∇f(xk)

• f(x) is convex (concave) if for all x ∈ℜn, ∇2f(x) is positive (negative) semidefinite λ ≥ λ i.e. minj j 0 (maxj j ” 0) • Method can fail if: - x0 far from optimum - ∇2f is singular at any point - f(x) is not smooth • Search direction, d, requires solution of linear . • Near solution: 2 xk+1 - x * = K xk - x *

23 Basic Newton Algorithm -

0. Guess x0, Evaluate f(x0).

1. At xk, evaluate ∇f(xk).

2. Evaluate Bk = ∇2f(xk) or an approximation.

3. Solve: Bk d = -’f(xk) If convergence error is less than tolerance: e.g., ||∇f(xk) || ≤ ε and ||d|| ≤ ε STOP, else go to 4.

4. Find α so that 0 < α ≤ 1 and f(xk + α d) < f(xk) sufficiently (Each trial requires evaluation of f(x))

5. xk+1 = xk + α d. Set k = k + 1 Go to 1.

24 Newton’s Method - Convergence Path

Starting Points [0.8, 0.2] needs steepest descent steps w/ line search up to ’O’, takes 7 iterations to ||∇f(x*)|| ” 10-6

[0.35, 0.65] converges in four iterations with full steps to ||∇f(x*)|| ” 10-6

25 Newton’s Method - Notes

• Choice of Bk determines method. - Steepest Descent: Bk = γ I - Newton: Bk = ∇2f(x) • With suitable Bk, performance may be good enough if f(xk + αd) is sufficiently decreased (instead of minimized along line search direction). • extensions to Newton's method provide very strong global convergence properties and very reliable algorithms. • Local rate of convergence depends on choice of Bk. x k +1 − x * − = Newton Quadratic Rate : limk →∞ 2 K x k − x * x k +1 − x * Steepest descent − Linear Rate : lim →∞ <1 k x k − x * x k+1 − x * Desired?− Superlinear Rate : lim →∞ = 0 k x k − x *

26 Quasi-Newton Methods

Motivation: • Need Bk to be positive definite. • Avoid calculation of ∇2f. • Avoid solution of linear system for d = - (Bk)-1 ∇f(xk)

Strategy: Define matrix updating formulas that give (Bk) symmetric, positive definite and satisfy: (Bk+1)(xk+1 - xk) = (∇fk+1 - ∇fk) (Secant relation)

DFP Formula: (Davidon, Fletcher, Powell, 1958, 1964) T T (y - Bks)yT + y (y - Bks) (y - Bks) s y yT Bk+1 = Bk + - yT s (yT s)(yT s)

T k T k -1 k+1 ss H y y H ()Bk+1 = H = Hk + - sT y y Hk y

where: s = xk+1- xk y = ∇f (xk+1) - ∇f (xk)

27 Quasi-Newton Methods

BFGS Formula (Broyden, Fletcher, Goldfarb, Shanno, 1970-71)

T k T k + yy B s s B Bk 1 = Bk + - sT y s Bk s

k T k T k T T −1 (s - H y)s + s (s - H y) (y - H s) y s s ()Bk+1 =Hk+1 = Hk + - yT s ()yT s ()yT s

Notes: 1) Both formulas are derived under similar assumptions and have symmetry 2) Both have superlinear convergence and terminate in n steps on quadratic functions. They are identical if α is minimized. 3) BFGS is more stable and performs better than DFP, in general. 4) For n ≤ 100, these are the best methods for general purpose problems if second derivatives are not available.

28 Quasi-Newton Method - BFGS Convergence Path

Starting Point [0.8, 0.2] starting from B0 = I, converges in 9 iterations to ||∇f(x*)|| ” 10-6

29 Sources For Unconstrained Software

Harwell (HSL) IMSL NAg - Unconstrained Optimization Codes Netlib (www.netlib.org) •MINPACK •TOMS Algorithms, etc. These sources contain various methods •Quasi-Newton •Gauss-Newton •Sparse Newton •Conjugate Gradient

30 (Nonlinear Programming)

Problem: Minx f(x) s.t. g(x) ≤ 0 h(x) = 0 where: f(x) - scalar objective function x - n vector of variables g(x) - inequality constraints, m vector h(x) - meq equality constraints.

Sufficient Condition for Unique Optimum - f(x) must be convex, and - feasible region must be convex, i.e. g(x) are all convex h(x) are all linear Except in special cases, ther is no guarantee that a local optimum is global if sufficient conditions are violated.

31 Example: Minimize Packing Dimensions

y What is the smallest box for three round objects?

Variables: A, B, (x1, y1), (x2, y2), (x3, y3)

Fixed Parameters: R1, R2, R3 Objective: Minimize Perimeter = 2(A+B) Constraints: Circles remain in box, can’t overlap 2 Decisions: Sides of box, centers of circles. A 3 1

B x  ≥ ≤ ≤  2 2 2 x1, y R1 x1 B- R1, y A - R1 ()x1 - x2 + (y - y ) ≥ ()R1 + R2  1 1  1 2  ≥ ≤ ≤ 2 2 2 y2 R 2 x2 B- R 2, y2 A - R 2  () () ≥ () x 2, x1 - x3 + y1 - y3 R1 + R3  ≥ ≤ ≤  2 x3, y3 R 3 x3 B- R 3, y3 A - R 3 ()2 ≥ ()2  x2 - x3 + ()y2 - y3 R2 + R3 in box ≥ no overlaps x1, x2, x3, y1, y2, y3, A, B 0

32 Characterization of Constrained Optima

Min

Min

Linear Program Linear Program (Alternate Optima)

Min

Min Min

Convex Objective Functions Linear Constraints Min

Min Min Min

Nonconvex Region Min Nonconvex Objective Multiple Optima 33 Multiple Optima What conditions characterize an optimal solution?

Unconstrained Local Minimum Unconstrained Local Minimum Necessary Conditions Sufficient Conditions ∇f (x*) = 0 ∇f (x*) = 0 pT∇2f (x*) p • 0 for p∈ℜn pT∇2f (x*) p > 0 for p∈ℜn (positive semi-definite) (positive definite)

34 Optimal solution for inequality constrained problem

Min f(x) s.t. g(x) ” 0 Analogy: Ball rolling down valley pinned by fence Note: Balance of forces (∇f, ∇g1)

35 Optimal solution for general constrained problem

Problem: Min f(x) s.t. g(x) ” 0 h(x) = 0 Analogy: Ball rolling on rail pinned by fences Balance of forces: ∇f, ∇g1, ∇h

36 Optimality conditions for local optimum

Necessary First Order Karush Kuhn - Tucker Conditions

∇ L (x*, u, v) = ∇f(x*) + ∇g(x*) u + ∇h(x*) v = 0 (Balance of Forces) u • 0 (Inequalities act in only one direction) g (x*) ” 0, h (x*) = 0 (Feasibility)

uj gj(x*) = 0 (Complementarity: either gj(x*) = 0 or uj = 0) u, v are "weights" for "forces," known as KKT multipliers, shadow prices, dual variables

“To guarantee that a local NLP solution satisfies KKT conditions, a constraint qualification is required. E.g., the Linear Independence Constraint Qualification ∇ ∇ (LICQ) requires active constraint gradients, [ gA(x*) h(x*)], to be linearly independent. Also, under LICQ, KKT multipliers are uniquely determined.”

Necessary (Sufficient) Second Order Conditions - Positive curvature in "constraint" directions. - pT∇ 2L (x*) p ≥ 0 (pT∇ 2L (x*) p > 0) ∇ T ∇ T where p are the constrained directions: gA(x*) p = 0, h(x*) p = 0 37 Single Variable Example of KKT Conditions

Min (x)2 s.t. -a ” x ” a, a > 0 x* = 0 is seen by inspection f(x)

Lagrange function : 2 L(x, u) = x + u1(x-a) + u2(-a-x)

First Order KKT conditions: ∇ L(x, u) = 2 x + u1 - u2 = 0 u1 (x-a) = 0 u2 (-a-x) = 0 -a ” x ” a u , u • 0 x 1 2 -a a Consider three cases:

• u1 > 0, u2 = 0 Upper bound is active, x = a, u1 = -2a, u2 = 0

• u1 = 0, u2 > 0 Lower bound is active, x = -a, u2 = -2a, u1 = 0

• u1 = u2 = 0 Neither bound is active, u1 = 0, u2 = 0, x = 0

Second order conditions (x*, u1, u2 =0) ∇ xxL (x*, u*) = 2 T∇ ∆ 2 p xxL (x*, u*) p = 2 ( x) > 0 38 Single Variable Example of KKT Conditions - Revisited Min -(x)2 s.t. -a ” x ” a, a > 0 -a x a x* = ±a is seen by inspection

Lagrange function : 2 L(x, u) = -x + u1(x-a) + u2(-a-x)

First Order KKT conditions: ∇ L(x, u) = -2x + u1 - u2 = 0 u1 (x-a) = 0 u2 (-a-x) = 0 f(x) -a ” x ” a u1, u2 • 0 Consider three cases:

• u1 > 0, u2 = 0 Upper bound is active, x = a, u1 = 2a, u2 = 0

• u1 = 0, u2 > 0 Lower bound is active, x = -a, u2 = 2a, u1 = 0

• u1 = u2 = 0 Neither bound is active, u1 = 0, u2 = 0, x = 0

Second order conditions (x*, u1, u2 =0) ∇ xxL (x*, u*) = -2 T∇ ∆ 2 p xxL (x*, u*) p = -2( x) < 0 39 Interpretation of Second Order Conditions

For x = a or x = -a, we require the allowable direction to satisfy the active constraints exactly. Here, any point along the allowable direction, x* must remain at its bound.

For this problem, however, there are no nonzero allowable directions that satisfy this condition. Consequently the solution x* is defined entirely by the active constraint. The condition: T ∇ p xxL (x*, u*, v*) p > 0 for all allowable directions, is vacuously satisfied - because there are ∇ T no allowable directions that satisfy gA(x*) p = 0. Hence, sufficient second order conditions are satisfied.

As we will see, sufficient second order conditions are satisfied by linear programs as well.

40 Role of KKT Multipliers

-a x a a + ∆a

Also known as: • Shadow Prices • Dual Variables f(x) • Lagrange Multipliers Suppose a in the constraint is increased to a + ∆a f(x*) = (a + ∆a)2 and [f(x*, a + ∆a) - f(x*, a)]/∆a = 2a + ∆a

df(x*)/da = 2a = u1 41 Special Cases of Nonlinear Programming

Linear Programming: Min cTx x s.t. Ax ” b 2 Cx = d, x • 0 Functions are all convex ⇒ global min. Because of Linearity, can prove solution will always lie at vertex of feasible region.

x1 Simplex Method - Start at vertex - Move to adjacent vertex that offers most improvement - Continue until no further improvement Notes: 1) LP has wide uses in planning, blending and scheduling 2) Canned programs widely available.

42 Example

Simplex Method

Min -2x1 - 3x2 Min -2x1 - 3x2 ⇒ s.t. 2x1 + x2 ” 5 s.t. 2x1 + x2 + x3 = 5 x1, x2 • 0 x1, x2, x3 • 0 (add slack variable) ⇒ Now, define f = -2x1 - 3x2 f + 2x1 + 3x2 = 0 Set x1, x2 = 0, x3 = 5 and form tableau x1 x2 x3 f b x1, x2 nonbasic 2 1 1 0 5 x3 basic 2 3 0 1 0

To decrease f, increase x2. How much? so x3 • 0 x1 x2 x3 f b 2 1 1 0 5 -4 0 -3 1 -15 f can no longer be decreased! Optimal

Underlined terms are -(reduced gradients); nonbasic variables (x1, x3), basic variable x2

43 Quadratic Programming

Problem: Min aTx + 1/2 xT B x A x ” b C x = d 1) Can be solved using LP-like techniques: (Wolfe, 1959) ⇒ Min Σj (zj+ + zj-) s.t. a + Bx + ATu + CTv = z+ - z- Ax - b + s = 0 Cx - d = 0 s, z+, z- • 0

{uj sj = 0} with complicating conditions.

2) If B is positive definite, QP solution is unique. If B is pos. semidefinite, optimum value is unique.

3) Other methods for solving QP’s (faster) - Complementary Pivoting (Lemke)

- Range, Null Space methods (Gill, Murray). 44 Portfolio Planning Problem

Definitions:

xi - fraction or amount invested in security i ri (t) - (1 + rate of return) for investment i in year t. µ i - average r(t) over T years, i.e.

T µ 1 i = ∑r i (t) T t=1 µ Max ∑ i xi i = s.t. ∑ xi 1 i ≥ xi ,0 etc. Note: maximize average return, no accounting for risk.

45 Portfolio Planning Problem Definition of Risk - fluctuation of ri(t) over investment (or past) time period. To minimize risk, minimize variance about portfolio mean (risk averse).

Variance/Covariance Matrix, S 1 T {} σ 2 ()µ ( µ ) S ij = ij = ∑ r i (t) - i r j (t) - j T t =1

Max x T Sx = s .t. ∑ x i 1 i µ ≥ ∑ i x i R i ≥ x i 0 , etc .

Example: 3 investments µ j  3 1 - 0.5  1. IBM 1.3 S =  1 2 0.4 2. GM 1.2 -0.5 0.4 1  3. Gold 1.08

46 Portfolio Planning Problem - GAMS

SIMPLE PORTFOLIO INVESTMENT PROBLEM (MARKOWITZ) 4 5 OPTION LIMROW=0; 6 OPTION LIMXOL=0; 7 8 VARIABLES IBM, GM, GOLD, OBJQP, OBJLP; 9 10 EQUATIONS E1,E2,QP,LP; 11 12 LP.. OBJLP =E= 1.3*IBM + 1.2*GM + 1.08*GOLD; 13 14 QP.. OBJQP =E= 3*IBM**2 + 2*IBM*GM - IBM*GOLD 15 + 2*GM**2 - 0.8*GM*GOLD + GOLD**2; 16 17 E1..1.3*IBM + 1.2*GM + 1.08*GOLD =G= 1.15; 18 19 E2.. IBM + GM + GOLD =E= 1; 20 21 IBM.LO = 0.; 22 IBM.UP = 0.75; 23 GM.LO = 0.; 24 GM.UP = 0.75; 25 GOLD.LO = 0.; 26 GOLD.UP = 0.75; 27 28 MODEL PORTQP/QP,E1,E2/; 29 30 MODEL PORTLP/LP,E2/; 31 32 SOLVE PORTLP USING LP MAXIMIZING OBJLP; 33 34 SOLVE PORTQP USING NLP MINIMIZING OBJQP; 47 Portfolio Planning Problem - GAMS

S O L VE S U M M A R Y **** MODEL STATUS 1 OPTIMAL **** OBJECTIVE VALUE 1.2750 RESOURCE USAGE, LIMIT 1.270 1000.000 ITERATION COUNT, LIMIT 1 1000 BDM - LP VERSION 1.01 A. Brooke, A. Drud, and A. Meeraus, Analytic Support Unit, Development Research Department, World Bank, Washington D.C. 20433, U.S.A.

Estimate work space needed -- 33 Kb Work space allocated -- 231 Kb EXIT - - OPTIMAL SOLUTION FOUND. LOWER LEVEL UPPER MARGINAL ----EQU LP . . . 1.000 ----EQU E2 1.000 1.000 1.000 1.200

LOWER LEVEL UPPER MARGINAL ----VAR IBM 0.750 0.750 0.100 ----VAR GM . 0.250 0.750 . ----VAR GOLD. .. 0.750 -0.120 ----VAR OBJLP -INF 1.275 +INF . **** REPORT SUMMARY : 0 NONOPT 0 INFEASIBLE 0 UNBOUNDED SIMPLE PORTFOLIO INVESTMENT PROBLEM (MARKOWITZ) Model Statistics SOLVE PORTQP USING NLP FROM LINE 34 MODEL STATISTICS BLOCKS OF EQUATIONS 3 SINGLE EQUATIONS 3 BLOCKS OF VARIABLES 4 SINGLE VARIABLES 4 NON ZERO ELEMENTS 10 NON LINEAR N-Z 3 DERIVITIVE POOL 8 CONSTANT POOL 3 CODE LENGTH 95 GENERATION TIME = 2.360 SECONDS EXECUTION TIME = 3.510 SECONDS

48 Portfolio Planning Problem - GAMS

S O L VE S U M M A R Y MODEL PORTLP OBJECTIVE OBJLP TYPE LP DIRECTION MAXIMIZE SOLVER MINOS5 FROM LINE 34 **** SOLVER STATUS 1 NORMAL COMPLETION **** MODEL STATUS 2 LOCALLY OPTIMAL **** OBJECTIVE VALUE 0.4210 RESOURCE USAGE, LIMIT 3.129 1000.000 ITERATION COUNT, LIMIT 3 1000 EVALUATION ERRORS 0 0 M I N O S 5.3 (Nov. 1990) Ver: 225-DOS-02 B.A. Murtagh, University of New South Wales and P.E. Gill, W. Murray, M.A. Saunders and M.H. Wright Systems Optimization Laboratory, Stanford University.

EXIT - - OPTIMAL SOLUTION FOUND MAJOR ITNS, LIMIT 1 FUNOBJ, FUNCON CALLS 8 SUPERBASICS 1 INTERPRETER USAGE .21 NORM RG / NORM PI 3.732E-17 LOWER LEVEL UPPER MARGINAL ----EQU QP . . . 1.000 ----EQU E1 1.150 1.150 +INF 1.216 ----EQU E2 1.000 1.000 1.000 -0.556 LOWER LEVEL UPPER MARGINAL ----VAR IBM . 0.183 0.750 . ----VAR GM . 0.248 0.750 EPS ----VAR GOLD . 0.569 0.750 . ----VAR OBJLP -INF 1.421 +INF . **** REPORT SUMMARY : 0 NONOPT 0 INFEASIBLE 0 UNBOUNDED 0 ERRORS SIMPLE PORTFOLIO INVESTMENT PROBLEM (MARKOWITZ) Model Statistics SOLVE PORTQP USING NLP FROM LINE 34 EXECUTION TIME = 1.090 SECONDS 49 Algorithms for Constrained Problems

Motivation: Build on unconstrained methods wherever possible.

Classification of Methods:

•Reduced Gradient Methods - (with Restoration) GRG2, CONOPT •Reduced Gradient Methods - (without Restoration) MINOS •Successive Quadratic Programming - generic implementations •Penalty Functions - popular in 1970s, but fell into disfavor. Barrier Methods have been developed recently and are again popular. •Successive Linear Programming - only useful for "mostly linear" problems

We will concentrate on algorithms for first four classes.

Evaluation: Compare performance on "typical problem," cite experience on process problems.

50 Representative Constrained Problem (Hughes, 1981)

g1 Š 0

g2 Š 0

α β Min f(x1, x2) = exp(- ) 2 2 g1 = (x2+0.1) [x1 +2(1-x2)(1-2x2)] - 0.16 ” 0 2 2 g2 = (x1 - 0.3) + (x2 - 0.3) - 0.16 ” 0 x* = [0.6335, 0.3465] f(x*) = -4.8380 51 Reduced with Restoration (GRG2/CONOPT)

Min f(x) Min f(z) s.t. g(x) + s = 0 (add slack variable) ‘⇒ s.t. h(z) = 0 c(x) = 0 a ” z ” b a ” x ” b, s • 0

• Partition variables into:

zB - dependent or basic variables

zN - nonbasic variables, fixed at a bound

zS - independent or superbasic variables Analogy to linear programming. Superbasics required only if nonlinear problem • Solve unconstrained problem in space of superbasic variables. T T T T Let z = [zS zB zN ] optimize wrt zS with h(zS, zB , zN) = 0

Calculate constrained derivative or reduced gradient wrt zS. ∈ m •Dependent variables are zB R

•Nonbasic variables zN (temporarily) fixed 52 Definition of Reduced Gradient

∂ ∂ df = f + dzB f ∂ ∂ dzS zS dzS zB Because h(z) = ,0 we have: T T  ∂h   ∂h  dh =   dz +   dz = 0 ∂ S ∂ B  zS   zB  −1 dz  ∂h  ∂h  − B = −   = −∇ h[]∇ h 1 ∂ ∂ zS zB dzS  zS  zB  This leads to :

df ∂f − ∂f = − ∇ h[]∇ h 1 ∂ zS zB ∂ dzS zS zB •By remaining feasible always, h(z) = 0, a ” z ” b, one can apply an

unconstrained algorithm (quasi-Newton) using (df/dzS)

•Solve problem in reduced space of zS variables 53 Example of Reduced Gradient

2 − Min x1 2 x2 + = ts .. 3x1 4 x2 24 ∇ T = ∇ T = h 3[ 4], f 2[ x1 - 2]

= = Let z S x1, z B x2 df ∂f − ∂f = − ∇ h[]∇ h 1 ∂ z S z B ∂ dz S z S z B df = − []−1 () = + 2 x1 3 4 - 2 2 x1 /3 2 dx1

∇ T ∇ T ∇ T If h is (m x n); zSh is m x (n-m); zBh is (m x m)

(df/dzS) is the change in f along constraint direction per unit change in zS

54 Sketch of GRG Algorithm 1.1. InitializeInitialize problemproblem andand obtainobtain aa feasiblefeasible pointpoint atat zz0 k 2.2. AtAt feasiblefeasible pointpoint zz ,, partitionpartition variablesvariables zz intointo zzN,, zzB,, zzS

3.3. CalculateCalculate reducedreduced gradient,gradient, ((df/dzdf/dzS)) -1 4.4. EvaluateEvaluate searchsearch directiondirection forfor zzS,, dd == BB (df/dz(df/dzS)) 5.5. PerformPerform aa lineline search.search. α∈α k αα •• FindFind (0,1] with zzS :=:= zzS ++ dd k αα •• SolveSolve forfor h(zh(zS ++ d,d, zzB,, zzN)) == 00 k αα k •• IfIf f(zf(zS ++ dd,, zzB,, zzN)) << f(zf(zS ,, zzB,, zzN),), k+1 k αα setset zzS ==zzS ++ d,d, k:=k:= k+1k+1 ε,ε, 6.6. IfIf ||||((df/dzdf/dzS)||<)||< Stop.Stop. Else,Else, gogo toto 2.2.

55 GRG Algorithm Properties

1. GRG2 has been implemented on PC’s as GINO and is very reliable and robust. It is also the optimization solver in MS EXCEL. 2. CONOPT is implemented in GAMS, AIMMS and AMPL 3. GRG2 uses Q-N for small problems but can switch to conjugate gradients if problem gets large. CONOPT uses exact second derivatives. ∇ 4. Convergence of h(zS, zB , zN) = 0 can get very expensive because h is required 5. Safeguards can be added so that restoration (step 5.) can be dropped and efficiency increases.

Representative Constrained Problem Starting Point [0.8, 0.2] • GINO Results - 14 iterations to ||∇f(x*)|| ” 10-6 • CONOPT Results - 7 iterations to ||∇f(x*)|| ” 10-6 from feasible point.

56 Reduced Gradient Method without Restoration (MINOS/Augmented)

Motivation: Efficient algorithms Strategy: (Robinson, Murtagh & Saunders) are available that solve linearly 1. Partition variables into basic, nonbasic constrained optimization variables and superbasic variables.. problems (MINOS): 2. Linearize active constraints at zk Dkz = ck Min f(x) 3. Let ψ = f (z) + vTh (z) + β (hTh) s.t. Ax ” b (Augmented Lagrange), Cx = d 4. Solve linearly constrained problem: Min ψ (z) Extend to nonlinear problems, s.t. Dz = c through successive linearization a ” z ” b using reduced gradients to get zk+1 Develop major iterations 5. Set k=k+1, go to 3. (linearizations) and minor 6. Algorithm terminates when no iterations (GRG solutions) . movement between steps 3) and 4).

57 MINOS/Augmented Notes

1. MINOS has been implemented very efficiently to take care of linearity. It becomes LP Simplex method if problem is totally linear. Also, very efficient matrix routines. 2. No restoration takes place, nonlinear constraints are reflected in ψ(z) during step 3). MINOS is more efficient than GRG. 3. Major iterations (steps 3) - 4)) converge at a quadratic rate. 4. Reduced gradient methods are complicated, monolithic codes: hard to integrate efficiently into modeling software.

Representative Constrained Problem – Starting Point [0.8, 0.2] MINOS Results: 4 major iterations, 11 function calls to ||∇f(x*)|| ” 10-6

58 Successive Quadratic Programming (SQP)

Motivation: • Take KKT conditions, expand in Taylor series about current point. • Take Newton step (QP) to determine next point.

Derivation – KKT Conditions ∇ ∇ ∇ ∇ xL (x*, u*, v*) = f(x*) + gA(x*) u* + h(x*) v* = 0 h(x*) = 0

gA(x*) = 0, where gA are the active constraints.

Newton - Step ∇ ∇ ∇   ∆  ∇ ( k k k) xx L g h x  x L x , u , v   A    ∇ T   ∆   ()k  g 0 0 u = -  gA x   A     T    ∇ h 0 0   ∆v  h()x k  Requirements: ∇ • xxL must be calculated and should be ‘regular’ •correct active set gA •good estimates of uk, vk 59 SQP Chronology

1. Wilson (1963) - active set can be determined by solving QP: ∇ T T ∇ Min f(xk) d + 1/2 d xx L(xk, uk, vk) d d ∇ T s.t. g(xk) + g(xk) d ” 0 ∇ T h(xk) + h(xk) d = 0

2. Han (1976), (1977), Powell (1977), (1978) ∇ - approximate xxL using a positive definite quasi-Newton update (BFGS) - use a line search to converge from poor starting points.

Notes: - Similar methods were derived using penalty (not Lagrange) functions. - Method converges quickly; very few function evaluations. - Not well suited to large problems (full space update used). For n > 100, say, use reduced space methods (e.g. MINOS).

60 Elements of SQP – Hessian Approximation

What about ∇xxL? • need to get second derivatives for f(x), g(x), h(x). k k ∇ • need to estimate multipliers, u , v ; xxL may not be positive semidefinite ⇒ ∇ k k k k Approximate xxL (x , u , v ) by B , a symmetric positive definite matrix. T k T k + yy B s s B Bk 1 = Bk + - sT y s Bk s BFGS Formula s = xk+1 - xk y = ∇L(xk+1, uk+1, vk+1) - ∇L(xk, uk+1, vk+1) • second derivatives approximated by change in gradients • positive definite Bk ensures unique QP solution

61 Elements of SQP – Search Directions How do we obtain search directions? • Form QP and let QP determine constraint activity • At each iteration, k, solve: Min ∇f(xk) Td + 1/2 dT Bkd d s.t. g(xk) + ∇g(xk) T d ” 0 h(xk) + ∇h(xk) T d = 0

Convergence from poor starting points • As with Newton's method, choose α (stepsize) to ensure progress toward optimum: xk+1 = xk + α d. • α is chosen by making sure a merit function is decreased at each iteration. Exact Penalty Function ψ µ Σ Σ (x) = f(x) + [ max (0, gj(x)) + |hj (x)|] µ > maxj {| uj |, | vj |} Augmented Lagrange Function ψ(x) = f(x) + uTg(x) + vTh(x) η Σ 2 Σ 2 + /2 { (hj (x)) + max (0, gj (x)) } 62 Newton-Like Properties for SQP

Fast Local Convergence B = ∇xxL Quadratic ∇xxL is p.d and B is Q-N 1 step Superlinear B is Q-N update, ∇xxL not p.d 2 step Superlinear

Enforce Global Convergence Ensure decrease of merit function by taking α ” 1 Trust region adaptations provide a stronger guarantee of global convergence - but harder to implement.

63 Basic SQP Algorithm

0. Guess x0, Set B0 = I (Identity). Evaluate f(x0), g(x0) and h(x0). 1. At xk, evaluate ∇f(xk), ∇g(xk), ∇h(xk). 2. If k > 0, update Bk using the BFGS Formula. ∇ k T T k 3. Solve: Mind f(x ) d + 1/2 d B d s.t. g(xk) + ∇g(xk)Td ≤ 0 h(xk) + ∇h(xk)Td = 0 If KKT error less than tolerance: ||∇L(x*)|| ≤ ε, ||h(x*)|| ≤ ε, || ≤ ε g(x*)+|| . STOP, else go to 4. 4. Find α so that 0 < α ≤ 1 and ψ(xk + αd) < ψ(xk) sufficiently (Each trial requires evaluation of f(x), g(x) and h(x)). 5. xk+1 = xk + α d. Set k = k + 1 Go to 2.

64 Problems with SQP

Nonsmooth Functions - Reformulate Ill-conditioning - Proper scaling Poor Starting Points – Trust Regions can help Inconsistent Constraint Linearizations - Can lead to infeasible QP's x 2

Min x2 2 s.t. 1 + x1 - (x2) ” 0 2 x 1 1 - x1 - (x2) ” 0 x2 • -1/2

65 SQP Test Problem

1.2

1.0

0.8 x2 0.6

x* 0.4

0.2

0.0 0.0 0.2 0.4 x1 0.6 0.8 1.0 1.2

Min x2 2 3 s.t. -x2 + 2 x1 - x1 ” 0 2 3 -x2 + 2 (1-x1) - (1-x1) ” 0 x* = [0.5, 0.375].

66 SQP Test Problem – First Iteration 1.2

1.0

0.8 x2 0.6

0.4

0.2

0.0 0.0 0.2 0.4 x1 0.6 0.8 1.0 1.2 T 0 Start from the origin (x0 = [0, 0] ) with B = I, form:

2 2 Min d2 + 1/2 (d1 + d2 ) s.t. d2 • 0 d1 + d2 • 1 T µ µ d = [1, 0] . with 1 = 0 and 2 = 1. 67 SQP Test Problem – Second Iteration 1.2

1.0

0.8 x2 0.6

0.4 x*

0.2

0.0 0.0 0.2 0.4 x1 0.6 0.8 1.0 1.2 T 1 From x1 = [0.5, 0] with B = I (no update from BFGS possible), form:

2 2 Min d2 + 1/2 (d1 + d2 ) s.t. -1.25 d1 - d2 + 0.375 ” 0 1.25 d1 - d2 + 0.375 ” 0 T µ µ d = [0, 0.375] with 1 = 0.5 and 2 = 0.5 T x* = [0.5, 0.375] is optimal 68 Representative Constrained Problem SQP Convergence Path

g1 Š 0

g2 Š 0

Starting Point[0.8, 0.2] - starting from B0 = I and staying in bounds and linearized constraints; converges in 8 iterations to ||∇f(x*)|| ” 10-6

69 Barrier Methods for Large-Scale Nonlinear Programming min f (x) x∈ℜn Can generalize for Original Formulation s.t c(x) = 0 a ≤ x ≤ b x ≥ 0

n ϕ = − µ Barrier Approach min µ (x) f (x) ∑ln xi x∈ℜn i=1 s.t c(x) = 0

⇒As µ Î 0, x*(µ) Î x* Fiacco and McCormick (1968)

70 Solution of the Barrier Problem

⇒Newton Directions (KKT System) ∇ f ( x ) + A( x )λ − v = 0 XVe − µ e = 0 c ( x ) = 0

⇒ ⇒ SolveReducing the System =µ −1 − − −1 dv X e v X Vdx W A − I d  ∇f + Aλ −v   +Σ  x   ∇ϕ   W A d=x − µ −  A 0 0  dλ  = − c  Σ= 1  T   +   X V    λ  −1  V A0 X 0dν   v− µc S e 

IPOPT Code – www.coin-or.org 71 Global Convergence of Newton-based Barrier Solvers Merit Function Exact Penalty: P(x, η) = f(x) + η ||c(x)|| Aug’d Lagrangian: L*(x, λ, η) = f(x) + λTc(x) + η ||c(x)||2 Assess Search Direction (e.g., from IPOPT) Line Search – choose stepsize α to give sufficient decrease of merit function using a ‘step to the boundary’ rule with τ ~0.99. α ∈ α = +α for ,0( ], xk +1 xk d x +α ≥ −τ > xk d x 1( )xk 0 = +α ≥ −τ > vk +1 vk dv 1( )vk 0 λ = λ +α λ − λ k+1 k ( + k )

• How do we balance φ (x) and c(x) with η? • Is this approach globally convergent? Will it still be fast?

72 Global Convergence Failure (Wächter and B., 2000) Min f (x) − − 1 = ts .. x1 x3 0 x2 2 2 − − = (x1) x2 1 0 ≥ x2 , x3 0 Newton-type line search ‘stalls’ even though descent directions exist x 1 k T + k = A(x ) d x c(x ) 0 k +α > x d x 0 Remedies: •Composite Step Trust Region (Byrd et al.)

•Filter Line Search Methods 73 Line Search Filter Method

Store (φk, θk) at allowed iterates Allow progress if trial point is θ acceptable to filter with margin φ(x) If switching condition α −∇φ T a ≥ δ θ b > > [ k d] [ k ,] a 2b 2 is satisfied, only an Armijo line search is required on φk If insufficient progress on stepsize, evoke restoration phase to reduce θ. Global convergence and superlinear local convergence proved (with second order correction)

θ(x) = ||c(x)||

74 Implementation Details

Modify KKT (full space) matrix if nonsingular W + Σ +δ A   k k 1 k  T −δ  Ak 2 I  δ 1 - Correct inertia to guarantee descent direction δ 2 - Deal with rank deficient Ak KKT matrix factored by MA27 Feasibility restoration phase + − 2 Min || c(x ||) 1 || x xk ||Q ≤ ≤ xl xk xu

Apply Exact Penalty Formulation Exploit same structure/algorithm to reduce infeasibility

75 Details of IPOPT Algorithm

Options Comparison

Line Search Strategies l 34 COPS Problems - 2 exact penalty merit function (600 - 160 000 variables) - augmented Lagrangian function 486 CUTE Problems - Filter method (adapted from (2 – 50 000 variables) Fletcher and Leyffer) 56 MIT T Problems (12097 – 99998 variables) Hessian Calculation - BFGS (reduced space) Performance Measure - SR1 (reduced space) - rp, l = (#iterp,l)/ (#iterp, min) - Exact full Hessian (direct) - P(τ) = fraction of problems τ - Exact reduced Hessian (direct) with log2(rp, l) < - Preconditioned CG

76 IPOPT Comparison with KNITRO and LOQO

CPU time (COPS) CPU time (CUTE) IPOPT IPOPT 1 1 KNITRO KNITRO LOQO LOQO 0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6 rho rho 0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 tau tau CPU time (MITT) •IPOPT has better performance, 1 robustness on CUTE, MITT and 0.9 COPS problem sets 0.8

0.7 IPOPT KNITRO •Similar results appear with iteration 0.6 LOQO counts rho 0.5

0.4 •Can be downloaded from

0.3 http://www.coin-or.org 0.2 •See links for additional information 0.1

0 0 1 2 3 4 5 6 7 8 77 tau Recommendations for Constrained Optimization

1. Best current algorithms • GRG 2/CONOPT • MINOS • SQP • IPOPT 2. GRG 2 (or CONOPT) is generally slower, but is robust. Use with highly nonlinear functions. Solver in Excel! 3. For small problems (n ” 100) with nonlinear constraints, use SQP. 4. For large problems (n • 100) with mostly linear constraints, use MINOS. ==> Difficulty with many nonlinearities

Fewer Function SQP MINOS Tailored Linear Evaluations CONOPT Algebra

Small, Nonlinear Problems - SQP solves QP's, not LCNLP's, fewer function calls. Large, Mostly Linear Problems - MINOS performs sparse constraint decomposition. Works efficiently in reduced space if function calls are cheap! Exploit Both Features – IPOPT takes advantages of few function evaluations and large- scale linear algebra, but requires exact second derivatives

78 Available Software for Constrained Optimization

SQP Routines HSL, NaG and IMSL (NLPQL) Routines NPSOL – Stanford Systems Optimization Lab SNOPT – Stanford Systems Optimization Lab (rSQP discussed later) IPOPT – http://www.coin-or.org

GAMS Programs CONOPT - Generalized Reduced Gradient method with restoration MINOS - Generalized Reduced Gradient method without restoration A student version of GAMS is now available from the CACHE office. The cost for this package including Process Design Case Students, GAMS: A User’s Guide, and GAMS - The Solver Manuals, and a CD-ROM is $65 per CACHE supporting departments, and $100 per non-CACHE supporting departments and individuals. To order please complete standard order form and fax or mail to CACHE Corporation. More information can be found on http://www.che.utexas.edu/cache/gams.html

MS Excel Solver uses Generalized Reduced Gradient method with restoration

79 Rules for Formulating Nonlinear Programs

1) Avoid overflows and undefined terms, (do not divide, take logs, etc.) e.g. x + y - ln z = 0 Î x + y - u = 0 exp u - z = 0 2) If constraints must always be enforced, make sure they are linear or bounds. e.g. v(xy - z2)1/2 = 3 Î vu = 3 u- (xy - z2) = 0, u • 0 3) Exploit linear constraints as much as possible, e.g. mass balance Î xi L + yi V = F zi li + vi = fi ∑ L – li = 0 4) Use bounds and constraints to enforce characteristic solutions. e.g. a ” x ” b, g (x) ” 0 to isolate correct root of h (x) = 0. 5) Exploit global properties when possibility exists. Convex (linear equations?) Linear Program? Quadratic Program? Geometric Program? 6) Exploit problem structure when possible. e.g. Min [Tx-3Ty] s.t. xT + y - T2 y = 5 4x - 5Ty + Tx = 7 0 ” T ” 1 (If T is fixed ⇒ solve LP) ⇒ put T in outer optimization loop.

80 Process Definition and Formulation

State of Nature and Problem Premises

Restrictions: Physical, Legal Desired Objective: Yield, Economic, Political, etc. Economic, Capacity, etc.

Decisions

Mathematical Modeling and Algorithmic Solution

Process Model Equations

Constraints Objective Function

Additional Variables

81 Large Scale NLP Algorithms Motivation: Improvement of Successive Quadratic Programming as Cornerstone Algorithm Î process optimization for design, control and operations Evolution of NLP Solvers: SQP rSQP IPOPT

rSQP++

1999-1988-98:1981-87: : Simultaneous Static Flowsheet Real-time dynamic optimization optimization optimization overoverover 1 000100 100 000000 variables variablesvariables and andand constraints constraintsconstraints

Current: Tailor structure, architecture and problems 82 Flowsheet Optimization Problems - Introduction

Modular Simulation Mode Physical Relation to Process - Intuitive to Process Engineer - Unit equations solved internally Out In - tailor-made procedures.

•Convergence Procedures - for simple flowsheets, often identified from flowsheet structure •Convergence "mimics" startup. •Debugging flowsheets on "physical" grounds

83 Flowsheet Optimization Problems - Features

Design Specifications Specify # trays reflux ratio, but would like to specify C overhead comp. ==> Control loop -Solve Iteratively

1 3

Nested Recycles Hard to Handle 2 4 Best Convergence Procedure?

•Frequent block evaluation can be expensive •Slow algorithms applied to flowsheet loops. •NLP methods are good at breaking looks

84 Chronology in Process Optimization

Sim. Time Equiv. 1. Early Work - Black Box Approaches Friedman and Pinder (1972) 75-150 Gaddy and co-workers (1977) 300 2. Transition - more accurate gradients Parker and Hughes (1981) 64 Biegler and Hughes (1981) 13 3. Infeasible Path Strategy for Modular Simulators Biegler and Hughes (1982) <10 Chen and Stadtherr (1985) Kaijaluoto et al. (1985) and many more 4. Based Process Optimization Westerberg et al. (1983) <5 Shewchuk (1985) 2 DMO, NOVA, RTOPT, etc. (1990s) 1-2

Process optimization should be as cheap and easy as process simulation

85 Process Simulators with Optimization Capabilities (using SQP)

Aspen Custom Modeler (ACM) Aspen/Plus gProms Hysim/Hysys Massbal Optisim Pro/II ProSim ROMeo VTPLAN

86 Simulation and Optimization of Flowsheets

3 2

h(y) = 0 4 6 1 w(y) y

5 Min f(x), s.t. g(x) ” 0 For single degree of freedom: • work in space defined by curve below. • requires repeated (expensive) recycle convergence

f(x, y(x))

x 87 Expanded Region with Feasible Path

88 "Black Box" Optimization Approach • Vertical steps are expensive (flowsheet convergence) • Generally no connection between x and y. • Can have "noisy" derivatives for gradient optimization.

89 SQP - Infeasible Path Approach • solve and optimize simultaneously in x and y • extended Newton method

90 Optimization Capability for Modular Simulators (FLOWTRAN, Aspen/Plus, Pro/II, HySys)

Architecture - Replace convergence with optimization block - Problem definition needed (in-line FORTRAN) - Executive, preprocessor, modules intact.

Examples 1. Single Unit and Acyclic Optimization - Distillation columns & sequences

2. "Conventional" Process Optimization - Monochlorobenzene process - NH3 synthesis

3. Complicated Recycles & Control Loops - Cavett problem - Variations of above

91 Optimization of Monochlorobenzene Process

S06 U = 100 Cooling HC1 Water 80o F

A-1 ABSORBER P PHYSICAL PROPERTY OPTIONS 15 Trays S11 (3 Theoretical Stages) Cavett Vapor Pressure 32 psia Redlich-Kwong Vapor Fugacity Benzene, S05 25 0.1 Lb Mole/Hr Corrected Liquid Fugacity S04 psia of MCB Feed o T Ideal Solution Activity Coefficient 80 F S10 D-1 o S07 37 psia 270 F DISTILLATION OPT (SCOPT) OPTIMIZER 30 Trays HC1 Optimal Solution Found After 4 Iterations F-1 (20 Theoretical Stages) S01 S02 FLASH Kuhn-Tucker Error 0.29616E-05 S09 Steam o Steam Allowable Kuhn-Tucker Error 0.19826E-04 360 F S03 S08 360oF Objective Function -0.98259 H-1 12,000 2 U = 100 T-1 Btu/hr- ft S12 TREATER o Feed Flow Rates 90 F Maximize H-2 Optimization Variables LB Moles/Hr Profit U = 100 HC1 10 32.006 0.38578 200.00 120.00 Benzene 40 S13 Water o S14 Tear Variables MCB 50 C S15 80 F 0.10601E-19 13.064 79.229 120.00 50.000 P-1 1200 F MCB Tear Variable Errors (Calculated Minus Assumed) T -0.10601E-19 0.72209E-06 -0.36563E-04 0.00000E+00 0.00000E+00 -Results of infeasible path optimization -Simultaneous optimization and convergence of tear streams.

92 Ammonia Process Optimization

Pr H2 Tr N2 Hydrogen Feed Nitrogen Feed N2 5.2% 99.8% H2 94.0% --- To CH4 0.79 % 0.02% TTf f Ar --- 0.01% ν

Product Hydrogen and Nitrogen feed are mixed, compressed, and combined with a recycle stream and heated to reactor temperature. Reaction occurs in a multibed reactor (modeled here as an equilibrium reactor) to partially convert the stream to ammonia. The reactor effluent is cooled and product is separated using two flash tanks with intercooling. Liquid from the second stage is flashed at low pressure to yield high purity

NH3 product. Vapor from the two stage flash forms the recycle and is recompressed. 93 Ammonia Process Optimization

Optimization Problem Performance Characterstics

Max {Total Profit @ 15% over five years} • 5 SQP iterations. • 2.2 base point simulations. s.t.• 105 tons NH3/yr. • objective function improves by • Pressure Balance 6 6 • No Liquid in Compressors $20.66 x 10 to $24.93 x 10 . • 1.8 ” H2/N2 ” 3.5 • difficult to converge flowsheet • Treact ” 1000o F at starting point • NH3 purged ” 4.5 lb mol/hr • NH3 Product Purity • 99.9 % • Tear Equations Optimum Starting point

Objective Function($106) 24.9286 20.659 1. Inlet temp. reactor (oF) 400 400 2. Inlet temp. 1st flash (oF) 65 65 3. Inlet temp. 2nd flash (oF) 35 35 4. Inlet temp. rec. comp. (oF) 80.52 107 5. Purge fraction (%) 0.0085 0.01 6. Reactor Press. (psia) 2163.5 2000 7. Feed 1 (lb mol/hr) 2629.7 2632.0 8. Feed 2 (lb mol/hr) 691.78 691.4 94 How accurate should gradients be for optimization?

Recognizing True Solution • KKT conditions and Reduced Gradients determine true solution • Derivative Errors will lead to wrong solutions!

Performance of Algorithms Constrained NLP algorithms are gradient based (SQP, Conopt, GRG2, MINOS, etc.) Global and Superlinear convergence theory assumes accurate gradients

Worst Case Example (Carter, 1991) Newton’s Method generates an ascent direction and fails for any ε ! Min f (x) = xT Ax ε + /1 ε ε − /1 ε  A =   ε − /1 ε ε + /1 ε  T = ∇ = ε − x0 1[ ]1 f (x0 ) x0 = ∇ + ε g(x0 ) f (x0 ) O( ) = − −1 d A g(x0 ) 95 Implementation of Analytic Derivatives

parameters, p exit variables, s

x Module Equations y

c(v, x, s, p, y) = 0

dy/dx ds/dx Sensitivity dy/dp ds/dp Equations

Automatic Differentiation Tools

JAKE-F, limited to a subset of FORTRAN (Hillstrom, 1982) DAPRE, which has been developed for use with the NAG library (Pryce, Davis, 1987) ADOL-C, implemented using operator overloading features of C++ (Griewank, 1990) ADIFOR, (Bischof et al, 1992) uses source transformation approach FORTRAN code . TAPENADE, web-based source transformation for FORTRAN code

Relative effort needed to calculate gradients is not n+1 but about 3 to 5 (Wolfe, Griewank) 96 Flash Recycle Optimization Ammonia Process Optimization (2 decisions + 7 tear variables) (9 decisions and 6 tear variables) Reactor S3

S1 S2 Flash Mixer P

S6 Hi P S4 S5 Flash S7 Ratio Lo P 2 2 3 1/2 Max S3(A) *S3(B) - S3(A) - S3(C) + S3(D) - (S3(E)) Flash

8000 200

GRG SQP GRG 6000 rSQP SQP rSQP

100 4000 CPU SecondsCPU (VS 3200) CPU SecondsCPU 3200) (VS 2000

0 0 1 2 Numerical Exact Numerical1 Exact 2 97 Large-Scale SQP Min f(z) Min ∇f(zk)T d + 1/2 d T Wk d s.t. c(z)=0 s.t. c(zk) + (Αk)T d = 0 k zL ” z ” zU zL ” z + d ” zU

Characteristics • Many equations and variables (• 100 000) • Many bounds and inequalities (• 100 000)

Few degrees of freedom (10 - 100) Steady state flowsheet optimization Real-time optimization Parameter estimation Many degrees of freedom (• 1000) Dynamic optimization (optimal control, MPC) State estimation and data reconciliation

98 Few degrees of freedom => reduced space SQP (rSQP)

• Take advantage of sparsity of A=∇c(x) • project W into space of active (or equality constraints) • curvature (second derivative) information only needed in space of degrees of freedom • second derivatives can be applied or approximated with positive curvature (e.g., BFGS) • use dual space QP solvers

+ easy to implement with existing sparse solvers, QP methods and line search techniques + exploits 'natural assignment' of dependent and decision variables (some decomposition steps are 'free') + does not require second derivatives

- reduced space matrices are dense - may be dependent on variable partitioning - can be very expensive for many degrees of freedom - can be expensive if many QP bounds

99 Reduced space SQP (rSQP) Range and Null Space Decomposition

Assume no active bounds, QP problem with n variables and m constraints becomes:  k k  d  ∇ k  W A = − f (x )  kT    k  A 0 λ+   c(x ) 

• Define reduced space basis, Zk∈ ℜ n x (n-m) with (Ak)TZk = 0 • Define basis for remaining space Yk∈ ℜ n x m, [Yk Zk]∈ℜn x n is nonsingular. k k • Let d = Y dY + Z dZ to rewrite:    k k T  k k k k dY  k k T  k []Y Z 0 W A []Y Z 0  []Y Z 0 ∇f (x )     = −   kT dZ  k  0 IA 0  0 I  0 I c(x )  λ+ 

100 Reduced space SQP (rSQP) Range and Null Space Decomposition 0 0  kT k k kT k k kT k  Y W Y Y W Y Y A    kT ∇ k    dY Y f (x ) kT k k kT k k     Z W Y Z W Z 0  = − kT ∇ k dZ  Z f (x )  kT k  k λ+     A Y 0 0   c(x ) 

T k • (A Y) dY =-c(x ) is square, dY determined from bottom row. T T • Cancel Y WY and Y WZ; (unimportant as dZ, dY --> 0) • (YTA) O = -YT∇f(xk), λ can be determined by first order estimate T T T∇ k • Calculate or approximate w= Z WY dY, solve Z WZ dZ =-Z f(x )-w • Compute total step: d = Y dY + Z dZ

101 Reduced space SQP (rSQP) Interpretation

dZ

dY

Range and Null Space Decomposition • SQP step (d) operates in a higher dimension

• Satisfy constraints using range space to get dY

• Solve small QP in null space to get dZ • In general, same convergence properties as SQP.

102 Choice of Decomposition Bases

1. Apply QR factorization to A. Leads to dense but well-conditioned Y and Z.

R R A = Q  = []Y Z   0 0 2. Partition variables into decisions u and dependents v. Create orthogonal Y and Z with embedded identity matrices (ATZ = 0, YTZ=0). T = [∇ T ∇ T ]= [ ] A uc vc N C  I   T −T  = = N C Z  −  Y   − C 1N  I 

3. Coordinate Basis - same Z as above, YT = [ 0 I ] • Bases use gradient information already calculated. • Adapt decomposition to QP step • Theoretically same rate of convergence as original SQP. • Coordinate basis can be sensitive to choice of u and v. Orthogonal is not. • Need consistent initial point and nonsingular C; automatic generation

103 rSQP Algorithm

1. Choose starting point x0. 2. At iteration k, evaluate functions f(xk), c(xk) and their gradients. 3. Calculate bases Y and Z.

4. Solve for step dY in Range space from T k (A Y) dY =-c(x ) k T 5. Update projected Hessian B ~ Z WZ (e.g. with BFGS), wk (e.g., zero)

6. Solve small QP for step dZ in Null space. T ∇ k + k T + T k Min (Z f (x ) w ) dZ 2/1 dZ B dZ ≤ k + + ≤ ts .. xL x YdY ZdZ xU 7. If error is less than tolerance stop. Else 8. Solve for multipliers using (YTA) λ = -YT∇f(xk) 9. Calculate total step d = Y dY + Z dZ. α 10. Find step size and calculate new point, xk+1 = xk + 11. Continue from step 2 with k = k+1.

104 rSQP Results: Computational Results for General Nonlinear Problems Vasantharajan et al (1990)

105 rSQP Results: Computational Results for Process Problems Vasantharajan et al (1990)

106 Comparison of SQP and rSQP Coupled Distillation Example - 5000 Equations Decision Variables - boilup rate, reflux ratio

Method CPU Time Annual Savings Comments 1. SQP* 2 hr negligible Base Case 2. rSQP 15 min. $ 42,000 Base Case 3. rSQP 15 min. $ 84,000 Higher Feed Tray Location 4. rSQP 15 min. $ 84,000 Column 2 Overhead to Storage 5. rSQP 15 min $107,000 Cases 3 and 4 together

18

10

1

QVK QVK

107 Real-time Optimization with rSQP Sunoco Hydrocracker Fractionation Plant (Bailey et al, 1993) Existing process, optimization on-line at regular intervals: 17 hydrocarbon components, 8 heat exchangers, absorber/stripper (30 trays), debutanizer (20 trays), C3/C4 splitter (20 trays) and deisobutanizer (33 trays).

FUEL GAS REACTOR EFFLUENT FROM LOW PRESSURE SEPARATOR

STRIPPER C3 ABSORBER/ LIGHT NAPHTHA MIXED LPG

PREFLASH iC4 C3/C4 C3/C4 SPLITTER DIB

REFORMER MAIN FRAC. MAIN

NAPHTHA DEBUTANIZER

RECYCLE nC4 OIL

• square parameter case to fit the model to operating data.

• optimization to determine best operating conditions 108 Optimization Case Study Characteristics

Model consists of 2836 equality constraints and only ten independent variables. It is also reasonably sparse and contains 24123 nonzero Jacobian elements.

NP G E P = + + m − P ∑ziCi ∑ziCi ∑ziCi U i∈G i∈E m=1

Cases Considered: 1. Normal Base Case Operation 2. Simulate fouling by reducing the heat exchange coefficients for the debutanizer 3. Simulate fouling by reducing the heat exchange coefficients for splitter feed/bottoms exchangers 4. Increase price for propane 5. Increase base price for gasoline together with an increase in the octane credit

109 110 Many degrees of freedom=> full space IPOPT

 k + Σ k  d  ∇ϕ k  W A = − (x )  kT    k   A 0 λ+   c(x ) 

• work in full space of all variables • second derivatives useful for objective and constraints • use specialized large-scale Newton solver

∇ λ ∇ + W= xxL(x, ) and A= c(x) sparse, often structured + fast if many degrees of freedom present + no variable partitioning required

- second derivatives strongly desired - W is indefinite, requires complex stabilization - requires specialized large-scale linear algebra

111 *DVROLQH%OHQGLQJ*DVROLQH%OHQGLQJ V OLQH 3LSH 2,/7$1.6

*DVROLQH%OHQGLQJ+HUH *$667$7,216

6XSSO\WDQNV

,QWHUPHGLDWHWDQNV ),1$/352'8&7758&.6

)LQDO3URGXFWWDQNV

112 Blending Problem & Model Formulation

q ,it .. qt, j .. qt,k

f f ..f max ∑(∑c f − ∑ c f ) t,ij t, jk t,k k t, k i ,it qi vk t k i ⇒ s.t. ∑ f − ∑ f + v = v t, jk t,ij t + ,1 j t, j k i v f − ∑ f = 0 t,j t,k t, jk j ⇒ ∑q f − ∑q f + q v = q v t, j t, jk ,it t,ij t + ,1 j t + ,1 j t, j t, j k i q f − ∑q f = 0 t, k t, k t, j t, jk Supply tanks (i) Intermediate tanks (j)Final Product tanks (k) j q ≤ q ≤ q k t,k kmax min v ≤ v ≤ v j t, j jmax f & v ------flowrates and tank volumes min q ------tank qualities

Model Formulation in AMPL

113 SmallSmall MultiMulti--dayday BlendingBlending ModelsModels SingleSingle QualitiesQualities

Haverly, C. 1978 (HM) Audet & Hansen 1998 (AHM)

F 1 B 1 F 1 B 1 P 1 P 1 F 2 F 2 B 2

P 2 F 3 B 2 F 3 B 3

114 HoneywellHoneywell BlendingBlending ModelModel –– MultipleMultiple DaysDays 4848 QualitiesQualities

115 Summary of Results – Dolan-Moré plot

Performance profile (iteration count)

1

0.9

0.8

0.7

0.6

φ 0.5 IPOPT LOQO 0.4 KNITRO SNOPT 0.3 MINOS LANCELOT 0.2

0.1

0 1 10 100 1000 10000 100000 1000000 10000000 τ

116 Comparison of NLP Solvers: Data Reconciliation

1000 800 LANCELOT MINOS 600 SNOPT 400 KNITRO Iterations LOQO 200 IPOPT 0 0 200 400 600 Degrees of Freedom

100

10 LANCELOT MINOS SNOPT 1 KNITRO 0 200 400 600 LOQO 0.1 IPOPT CPU TimeCPU (s, norm.)

0.01 Degrees of Freedom

117 Sensitivity Analysis for Nonlinear Programming

At nominal conditions, p0

Min f(x, p0) s.t. c(x, p0) = 0 a(p0) ” x ” b(p0)

How is the optimum affected at other conditions, p  p0? • Model parameters, prices, costs • Variability in external conditions • Model structure

• How sensitive is the optimum to parameteric uncertainties? • Can this be analyzed easily?

118 Calculation of NLP Sensitivity

Take KKT Conditions ∇L(x*, p, λ, v) = 0

c(x*, p0) = 0 T E x* - bnd( p0)=0 and differentiate and expand about p0. ∇ λ T ∇ λ T ∇ T ∇ λ T ∇ λT Ε ∇ T pxL(x*, p, , v) + xxL(x*, p, , v) px* + xh(x*, p, , v) p + pv = 0 ∇ T ∇ T ∇ T pc(x*, p0) + xc(x*, p0) px* = 0 T ∇ T ∇ T E ( px* - pbnd )=0 Notes: • A key assumption is that under strict complementarity, the active set will not change for small perturbations of p. • ∇ T ∇ T If an element of x* is at a bound then pxi* = pbnd • ∇ T Second derivatives are required to calculate sensitivities, px* • ∇ T Is there a cheaper way to calculate px* ?

119 DecompositionDecomposition forfor NLPNLP SensitivitySensitivity

Let L(x*, p,λ,v) = f (x*) + λT c(x*) + (ET (x*−bnd( p)))T v W A E∇ x*T  ∇ L(x*, p,λ,v)T    p   xp  T ∇ λT = − ∇ T A 0 0 p   pc(x*, p)   T  ∇ T   − T ∇ T  E 0 0 pv   E pbnd 

•Partition variables into basic, nonbasic and superbasic ∇ T ∇ T ∇ T ∇ T px = Z pxS + Y pxB + T pxN ∇ T ∇ T •Set pxN = pbndN , nonbasic variables to rhs, •Substitute for remaining variables •Perform range and null space decomposition ∇ T ∇ T •Solve only for pxS and pxB

120 DecompositionDecomposition forfor NLPNLP SensitivitySensitivity

Y TWY Y TWY Y T A∇ x T  Y T (∇ L(x*, p,λ,v)T +W T∇ x T )   p B   xp p N  T T ∇ T = − T ∇ λ T + ∇ T Z WY Z WZ 0  p xS  Z ( xp L(x*, p, ,v) W T p xN )  T  ∇ λT   ∇ T + T ∇ T   A Y 0 0  p   pc(x*, p) A T p xN 

∇ T ∇ T • Solve only for pxB from bottom row and pxS from middle row T T ∇ T • If second derivatives are not available, Z WZ, Z WY pxB and T ∇ T Z WT pxN can be constructed by directional finite differencing • If assumption of strict complementarity is violated, sensitivity can be calculated using a QP subproblem.

121 Second Order Tests Reduced Hessian needs to be positive definite

T∇ * At solution x*: Evaluate eigenvalues of Z xxL Z Strict local minimum if all positive.

• Nonstrict local minimum: If nonnegative, find eigenvectors for zero eigenvalues, Î regions of nonunique solutions • Saddle point: If any are negative, move along directions of corresponding eigenvectors and restart optimization.

x2

z 1

Saddle Point x* x1

z 2

122 Sensitivity for Flash Recycle Optimization (2 decisions, 7 tear variables)

S3

S1 S2 Mixer Flash P

S6 S4 S5 S7 Ratio

2 2 3 1/2 Max S3(A) *S3(B) - S3(A) - S3(C) + S3(D) - (S3(E))

•Second order sufficiency test: •Dimension of reduced Hessian = 1 •Positive eigenvalue •Sensitivity to simultaneous change in feed rate and upper bound on purge ratio •Only 2-3 flowsheet perturbations required for second order information

123 Ammonia Process Optimization (9 decisions, 8 tear variables)

20 Sensitivities vs. Re-optimized Pts

Reactor 19.5 Sensitivity

19 QP1

QP2 18.5 Actual Hi P Flash 18

Lo P Objective Function 17.5 Flash

17 0.1 0.01 •Second order sufficiency test: 0.001 Relative perturbation change •Dimension of reduced Hessian = 4 •Eigenvalues = [2.8E-4, 8.3E-10, 1.8E-4, 7.7E-5] •Sensitivity to simultaneous change in feed rate and upper bound on reactor conversion •Only 5-6 extra perturbations for second derivatives

124 Multiperiod Optimization

Coordination

Case 1 Case 2 Case 3 Case 4 Case N

1. Design plant to deal with different operating scenarios (over time or with uncertainty) 2. Can solve overall problem simultaneously • large and expensive • polynomial increase with number of cases • must be made efficient through specialized decomposition

3. Solve also each case independently as an optimization problem (inner problem with fixed design) • overall coordination step (outer optimization problem for design) • require sensitivity from each inner optimization case with design variables as external parameters

125 Multiperiod Flowsheet Example

i i C T1 A1

i CTA 0 0 V i i F i T F 2 2 1

A

W i i Tw Tw 2 1

Parameters Period 1 Period 2 Period 3 Period 4 E (kJ/mol) 555.6 583.3 611.1 527.8

k0 (1/h) 10 11 12 9 F (kmol/h) 45.4 40.8 24.1 32.6 Time (h) 1000 4000 2000 1000

126 Multiperiod Design Model Σ Min f0(d) + i fi(d, xi) s.t. hi(xi, d) = 0, i = 1,… N gi(xi, d) ” 0, i = 1,… N r(d) ” 0 Variables: x: state (z) and control (u) variables in each operating period d: design variables (e. g. equipment parameters) used δ δ i: substitute for d in each period and add i = d d

Σ hi, gi Min f0(d) + i fi(d, xi) δ s.t. hi(xi, i) = 0, i = 1,… N δ gi(xi, i) +si = 0, i = 1,… N δ 0 ” si, d – i=0, i = 1,… N r(d) ” 0

127 Multiperiod Decomposition Strategy SQP Subproblem

•Block diagonal bordered KKT matrix (arrowhead structure) •Solve each block sequentially (range/null dec.) to form small QP in space of d variables •Reassemble all other steps from QP solution

128 Multiperiod Decomposition Strategy From decomposition of KKT block in each period, obtain the following directions that are parametric in ∆d:

Substituting back into the original QP subproblem leads to a QP only in terms of ∆d.  N  minimize φ = ∇ f T + ∑ { ∇ f T (Z + Y ) + (Z + Y )T ∇2 L (Z + Y ) } ∆d d 0 p i Bi Bi Ai Ai p i Bi B i  i =1  1  N  + ∆d T ∇2 L + ∑ { (Z + Y )T ∇2 L (Z + Y ) } ∆d d 0 Bi Bi p i Bi B i 2  i =1  ∇ ∆ ≤ subject to r + dr d 0

Once ∆d is obtained, directions are obtained from the above equations.

129 Starting Point

Original QP Decoupling Multiperiod Decomposition Strategy Active Set Update

ƒ p steps are parametric in ∆d and their Null - Range i Decomposition components are created independently ƒ Calculate Decomposition linear in number of Reduced periods and trivially parallelizable Hessian ƒ Choosing the active inequality constraints can be done through: QP at d -Active set strategy (e.g., bound Calculate addition) Step -Interior point strategy using barrier terms in objective MINOR Bound Addition • Easy to implement in decomposition ITERATION LOOP Line search

Optimality NO Conditions YES Optimal Solution 130 Multiperiod Flowsheet 1 (13+2) variables and (31+4) constraints (1 period) 262 variables and 624 constraints (20 periods)

i i T CA 1 1

400 i CTA 0 0 V i i 300 F i T F 2 2 1 CPU time (s) SQP (T) 200 MINOS(T) MPD/SQP(T)

100 A

i W 0 0 10 20 30 i Periods Tw Tw 2 1

131 Multiperiod Example 2 – Heat Exchanger Network (12+3) variables and (31+6) constraints (1 period) 243 variables and 626 constraints (20 periods)

H H 1 2

i i 50 T T 1 5

i i T T 40 3 4 C A A 1 1 2 563 K

30 i i T T 2 6 CPU time (s) SQP (T) MINOS (T) 300 K MPD/SQP (T) i 20 T i 8 Q A C A 393 K c 4 2 3 10

i 320 K T 7 0 0 10 20 30 350 K Periods

132 Summary and Conclusions -Unconstrained Newton and Quasi Newton Methods -KKT Conditions and Specialized Methods -Reduced Gradient Methods (GRG2, MINOS) -Successive Quadratic Programming (SQP) -Reduced Hessian SQP -Interior Point NLP (IPOPT)

Process Optimization Applications -Modular Flowsheet Optimization -Equation Oriented Models and Optimization -Realtime Process Optimization -Blending with many degrees of freedom

Further Applications -Sensitivity Analysis for NLP Solutions -Multiperiod Optimization Problems 133 Optimization of Differential- Algebraic Equation Systems

L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA DAE Optimization Outline

I Introduction Process Examples II Parametric Optimization - Gradient Methods • Perturbation • Direct - Sensitivity Equations • Adjoint Equations III Optimal Control Problems - Optimality Conditions - Model Algorithms • Sequential Methods • Multiple Shooting • Indirect Methods IV Simultaneous Solution Strategies - Formulation and Properties - Process Case Studies - Software Demonstration

2 DynamicDynamic OptimizationOptimization ProblemProblem

Φ( ) min z(t), y(t), u(t), p, t f

dz t)( s.t. = f ( (tz ), (ty ),u(t),t, p) dt g(z(t), y(t),u(t),t, p)= 0

z o z ( 0 ) z l d z ( t ) d z u y l d y ( t ) d y u u l d u ( t ) d u u p l d p d p u

t, time tf, final time z, differential variables u, control variables y, algebraic variables p, time independent parameters 3 DAE Models in Process Engineering

Differential Equations •Conservation Laws (Mass, Energy, Momentum)

Algebraic Equations •Constitutive Equations, Equilibrium (physical properties, hydraulics, rate laws) •Semi-explicit form •Assume to be index one (i.e., algebraic variables can be solved uniquely by algebraic equations) •If not, DAE can be reformulated to index one (see Ascher and Petzold)

Characteristics •Large-scale models – not easily scaled •Sparse but no regular structure •Direct linear solvers widely used •Coarse-grained decomposition of linear algebra

4 Parameter Estimation Catalytic Cracking of Gasoil (Tjoa, 1991)

A →p1 Q, Q →p2 S, A →p3 S  = − + 2 1.0 a ( p1 p3 )a  = − 2 − 0.8 q p1a p2q = = 0.6 YA_data a )0( ,1 q )0( 0 YQ_data Yi 0.4 YA_estimate number of states and ODEs: 2 YQ_estimate number of parameters:3 0.2 no control profiles 0.0 0.0 0.2 0.4 0.6 0.8 1.0 constraints: pL ” p ” pU t

Objective Function: Ordinary

0 (p1, p2, p3) = (6, 4, 1) (p1, p2, p3)* = (11.95, 7.99, 2.02) (p1, p2, p3)true = (12, 8, 2)

5 Batch Distillation Multi-product Operating Policies

5XQEHWZHHQGLVWLOODWLRQEDWFKHV 7UHDWDVERXQGDU\YDOXHRSWLPL]DWLRQSUREOHP :KHQWRVZLWFKIURP$WRRIIFXW WR%" +RZPXFKRIIFXW WRUHF\FOH" 5HIOX[" %RLOXS 5DWH" 2SHUDWLQJ7LPH"

$ %

6 Nonlinear Model Predictive Control (NMPC)

NMPC Estimation and Control

z : differential states Why NMPC? d : disturbances y : algebraic states Process  Track a profile  Severe nonlinear dynamics (e.g, sign changes in gains) u : manipulated variables NMPC Controller  Operate process over wide range z′ = F(z, y,u, p,d ) (e.g., startup and shutdown) 0 = G()z, y,u, p,d

ysp : set points Model Updater ′ ( ) z = F z, y,u, p,d NMPC Subproblem 0 = G()z, y,u, p,d − sp + k − k −1 min ∑|| y(t) y || y ∑|| u(t ) u(t ||) u u Q Q z′(t) = F (z(t), y(t), u(t),t) ts .. 0 = G(z(t), y(t), u(t),t) z(t) = z init Bound Constraint s Other Constraint s 7 Batch Process Optimization Optimization of dynamic batch process operation resulting from reactor and distillation column DAE models: A+B→C z’ = f(z, y, u, p) C+B→P+E g(z, y, u, p) = 0 P+C→G number of states and DAEs: n + n z y z0 z0 0 z0 B parameters for equipment design i,I i,II zi,III i,IV i zf (reactor, column) f f f i,IV zi,I zi,II zi,III nu control profiles for optimal operation

Constraints: uL ” u(t) ” uU zL ” z(t) ” zU

yL ” y(t) ” yU pL ” p ” pU Objective Function: amortized economic function at end of cycle time tf 640 25 C o nsta nt C o nsta nt 630 Dyn a m ic Dyn a m ic 20 620

610 15

600 10 590

580 5 0 0.25 0.5 0.75 1 1.25 0 0.5 1 1.5 2 2.5 Tim e (h r.) optimal reactor temperature policy optimal column reflux ratio 8 Reactor Design Example Plug Flow Reactor Optimization The cracking furnace is an important example in the olefin production industry, where various hydrocarbon feedstocks react. Consider a simplified model for ethane cracking (Chen et al., 1996). The objective is to find an optimal profile for the heat flux along the reactor in order to maximize the production of ethylene. CH2 4 Max F exit s.t. DAE ≤ Texit 1180K The reaction system includes six molecules, three free radicals, and seven reactions. The model also includes the heat balance and the pressure drop equation. This gives a total of eleven differential equations. C H→2 CH • CH•+ C H → CH + C H • 2 6 3 3 2 6 4 2 5 HCHHCH•+ → + • C HCHH•→ + • 2 6 2 2 5 2 5 2 4 C H•+ CH → CH + CH • 2CHCH•→ 2 5 2 4 3 6 3 2 5 4 10 HCHCH•+ → • 2 4 2 5

Concentration and Heat Addition Profile

6 2500 5 2000 4 1500 3 1000 2 1 500 Flow rate mol/s Heat flux kJ/m2s 0 0 0 2 4 6 8 10 Length m

C2H4 C2H6 log(H2)+12 q 9 Dynamic Optimization Approaches

Pontryagin(1962)

DAE Optimization Problem Variational Approach

Inefficient for constrained problems Discretize Vassiliadis(1994) controls Apply a NLP solver Sequential Approach

Efficient for constrained problems

10 Sequential Approaches - Parameter Optimization

Consider a simpler problem without control profiles: e.g., equipment design with DAE models - reactors, absorbers, heat exchangers Φ Min (z(tf))

z' = f(z, p), z (0) = z0

g(z(tf)) ” 0, h(z(tf)) = 0 By treating the ODE model as a "black-box" a sequential algorithm can be constructed that can be treated as a nonlinear program. NLP Solver

P

φ,g,h ODE Gradient Model z (t) Calculation

Task: How are gradients calculated for optimizer?

11 Gradient Calculation

Perturbation Sensitivity Equations Adjoint Equations

Perturbation Calculate approximate gradient by solving ODE model (np + 1) times ψ = Φ Let , g and h (at t = tf) ψ ψ ψ d /dpi = { (pi + ¨pi) - (pi)}/ ¨pi Very simple to set up Leads to poor performance of optimizer and poor detection of optimum unless roundoff error (O(1/¨pi) and truncation error (O(¨pi)) are small. Work is proportional to np (expensive)

12 Direct Sensitivity

∂ From ODE model: {}z′ = f (z, p,t), z )0( = z ( p) ∂p 0 ∂z(t) define s (t) = i = 1, ...np i ∂ pi d ∂f ∂f T ∂z )0( s′ = (s ) = + s , s )0( = i i ∂ ∂ i i ∂ dt pi z pi (nz x np sensitivity equations)

• z and si , i = 1,…np, an be integrated forward simultaneously.

• for implicit ODE solvers, si(t) can be carried forward in time after converging on z • linear sensitivity equations exploited in ODESSA, DASSAC, DASPK, DSL48s and a number of other DAE solvers Sensitivity equations are efficient for problems with many more constraints than parameters (1 + ng + nh > np)

13 Example: Sensitivity Equations

′ = 2 + 2 z1 z1 z2 ′ = + z2 z1 z2 z1 pb = = z1 ,5 z2 )0( pa = ∂ ∂ = ∂ ∂ = s(t)a, j z(t) j / pa , s(t)b, j z(t) j / pb , j 2,1 ′ = + sa 1, 2z1 sa 1, 2z2 sa 2, ′ = + + sa 2, z1 sa 2, z2 sa 1, sa 1, pb = = sa 1, ,0 sa 2, )0( 1

′ = + sb 1, 2z1 sb 1, 2z2 sb 2, ′ = + + + sb 2, z1 z1 sb 2, z2 sb 1, sb 1, pb = = sb 1, ,0 sb 2, )0( 0

14 Adjoint Sensitivity

Adjoint or Dual approach to sensitivity Adjoin model to objective function or constraint t ψ Φ f ( = ,g or h) ψ =ψ − λT ′ − (t f ) ∫ (z f (z, p,t))dt 0 t f ψ =ψ + λ T − λ T + T λ′ + λT (t f ) )0( z0 ( p) (t f ) z(t f ) ∫ z F(z, p,t))dt 0 (λ(t)) serve as multipliers on ODE’s) Now, integrate by parts 0 0 T T ∂ψ (z(t ))  ∂  t f  ∂ T ∂  ψ = f − λ δ + z0 ( p) λ + λ′ + f λ δ + f λ d  (t f ) z(t f )  )0(  dp ∫   z(t)   dp dt ∂ ∂  ∂  ∂  z(t f )   p  0 z  p 

and find dψ/dp subject to feasibility of ODE’s Now, set all terms not in dp to zero.

15 Adjoint System ∂f ∂ψ (z(t )) λ′ = − λ(t), λ(t ) = f ∂ f ∂ z z(t f ) t dψ ∂z ( p) f ∂f  = 0 λ )0( + λ(t) dt ∂ ∫ ∂  dp p 0  p  Integrate model equations forward Integrate adjoint equations backward and evaluate integral and sensitivities. Notes: nz (ng + nh + 1) adjoint equations must be solved backward (one for each objective and constraint function) for implicit ODE solvers, profiles (and even matrices) can be stored and carried backward after solving forward for z as in DASPK/Adjoint (Li and Petzold) more efficient on problems where: np > 1 + ng + nh

16 Example: Adjoint Equations

′ = 2 + 2 z1 z1 z2 ′ = + z1 z1 z2 z1 pb = = z1 ,5 z2 )0( pa

λT = λ 2 + 2 + λ + Form f (z, p,t) 1(z1 z2 ) 2 (z1 z2 z1 pb ) ∂f ∂ψ (z(t )) λ′ = − λ(t), λ(t ) = f ∂ f ∂ z z(t f ) t dψ ∂z ( p) f ∂f  = 0 λ )0( + λ(t) dt ∂ ∫ ∂  dp p 0  p  then becomes: ∂ψ (t ) λ′ = −2λ z − λ (z + p ), λ (t ) = f 1 1 1 2 2 b 1 f ∂ z1(t f ) ∂ψ (t ) λ′ = −2λ z − λ z , λ (t ) = f 2 1 2 2 1 2 f ∂ z2 (t f )

dψ (t ) f = λ 1 )0( dpa dψ (t ) t f f = λ ∫ 2 (t)z1(t)dt dpb 0 17 Example: Hot Spot Reactor L Φ = − − Min L ∫ (T (t) TS /TR )dt T ,T ,L,T P R S 0 dq ts .. = 1(3.0 − q(t))exp[20 − 20 /T (t)], q )0( = 0 dt dT dq = − (5.1 T (t) −T /T ) + 3/2 , T )0( =1 dt S R dt o ∆ = + feed(TR , 110 C) - H product(TP , T(L)) 0 = o = + o TP 120 C, T(L) 1 10 C/TR

TP = specified product temperature TR = reactor inlet, reference temperature T R 3:1 B/A A + 3B --> C + 3D L = reactor length 383 K L Ts = steam sink temperature q(t) = reactor conversion profile T P T s T(t) = normalized reactor temperature profile

Cases considered: • Hot Spot - no state variable constraints • Hot Spot with T(t) ” 1.45

18 Hot Spot Reactor: Unconstrained Case

Method: SQP (perturbation derivatives)

L(norm) TR(K) TS(K) TP(K) Initial: 1.0 462.23 425.26 250 Optimal: 1.25 500 470.1 188.4 13 SQP iterations / 2.67 CPU min. (µVax II)

1.6 1.2

1.5 1.0

1.4 0.8

1.3 0.6

1.2 0.4 Conversion, q

1.1 0.2 Normalized Temperature

1.0 0.0 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 Normalized Length Normalized Length Constrained Temperature Case: could not be solved with sequential method 19 Tricks to generalize classes of problems

Variable Final Time (Miele, 1980) τ 0 ≤ τ ≤ Define t = pn+1 , 1, pn+1 = tf

Let dz/dt = (1/ pn+1) dz/dτ = f(z, p) ⇒ dz/dτ = (pn+1) f(z, p)

Converting Path Constraints to Final Time

Define measure of infeasibility as a new variable, znz+1(t) (Sargent & Sullivan, 1977):

t f = 2 znz+1(t f ) ∑ ∫ max( ,0 g j (z(t),u(t)) dt j 0  = 2 = or znz+1(t) ∑ max( ,0 g j (z(t),u(t)) , znz+1 )0( 0 j ≤ ε Enforce znz+1(t f ) (however, constraint is degenerate)

20 Profile Optimization - (Optimal Control)

Optimal Feed Strategy (Schedule) in Batch Reactor Optimal Startup and Shutdown Policy Optimal Control of Transients and Upsets

Sequential Approach: Approximate control profile as through parameters (piecewise constant, linear, polynomial, etc.) Apply NLP to discretization as with parametric optimization Obtain gradients through adjoints (Hasdorff; Sargent and Sullivan; Goh and Teo) or sensitivity equations (Vassiliadis, Pantelides and Sargent; Gill, Petzold et al.)

Variational (Indirect) approach: Apply optimality conditions and solve as boundary value problem

21 Derivation of Variational Conditions Indirect Approach

Optimality Conditions (Bound constraints on u(t)) Min φ(z(tf))

s.t. dz/dt = f(z, u), z (0) = z0 g (z(tf)) ” 0 h (z(tf)) = 0 a ” u(t) ” b Form Lagrange function - adjoin objective function and constraints:

φ = φ + T µ + T (t f ) g ( z (t f )) h( z (t f )) v t − f λT  − + α T − +α T − ∫ ( z f ( z, u )) a (a u (t )) b (u (t ) b) dt 0 Integrate by parts : φ = φ + T µ + T + λT − λT (t f ) g ( z (t f )) h( z (t f )) v )0( z )0( (t f ) z (t f ) t + f λT + λT + α T − +α T − ∫ z f ( z, u ) a (a u (t )) b (u (t ) b) dt 0 22 Derivation of Variational Conditions  ∂φ ∂ ∂  T δφ = + g µ + h − λ δ + λT δ  v  z (t f ) )0( z )0(  ∂z ∂z ∂z  T T t  ∂   ∂  + f λ + f ( z, u ) λ δ + f ( z, u ) λ + α − α δ ≥ ∫   z (t )  b a  u (t ) dt 0 0  ∂z   ∂u  At optimum, δφ ≥ 0. Since u is the control variable, let all other terms vanish. ⇒ δz(tf):  ∂φ ∂g ∂h  λ()t =  + µ + γ  f  ∂z ∂z ∂z  t=t f δz(0): λ(0) = 0 (if z(0) is not specified) δ ∂ ∂ z(t): λ = − H = − f λ ∂z ∂z Define Hamiltonian, H = λTf(z,u) α T (a − u(t)) For u not at bound: ∂f ∂H a λ = = 0 α T (u(t) − b) ∂u ∂u b u ≤ u(t) ≤ u ∂ a b H α ≥ α ≥ For u at bounds: = α − α a 0, b 0 ∂u a b ∂ ∂H H = α ≥ = −α ≤ Lower bound, u(t) = a, a 0 Upper bound, u(t) = b, b 0 ∂u ∂u 23 Car Problem Travel a fixed distance (rest-to-rest) in minimum time. Min x (t ) Min t 3 f f ts .. x ’= x ts .. x"= u 1 2 x ’= u a ≤ u(t) ≤ b 2 x ’=1 x )0( = ,0 x(t ) = L 3 f a ≤ u(t) ≤ b = = x )0(’ ,0 x (’ t f ) 0 = = x1 )0( ,0 x1(t f ) L = = x2 )0( ,0 x2 (t f ) 0 = λ + λ + λ Hamiltonia :n H 1x2 2u 3 λ = ==> λ = Adjoints : 1 0 1(t) c1 λ = −λ ==> λ = + − 2 1 2 (t) c2 c1(t f t) λ = ==> λ = λ = 3 0 3 (t f ) ,1 3 (t) 1  = + > = ∂H t ,0 1tc f c2 ,0 u b = λ = c + c (t − t) ∂ 2 2 1 f = > = u  t t f ,c2 ,0 u a λ = = Crossover ( 2 )0 occurs at t ts 24 Car Problem Analytic Variational Solution b

Optimal Profile u(t) From state equations:  2 1 / 2 bt ,t < ts x1(t) =   2 2 ≥ 1 / 2() bts - a() ts - tf , t ts  a  x (t) = bt, t < ts 2  t s ≥ t f  bts + a() t - ts , t ts Apply boundary conditions at t = tf: 2 2 x1(tf) = 1/2 (b ts - a (ts - tf) ) = L •Problem is linear in u(t). Frequently x2(tf) = bts + a (tf - ts) = 0 1/2 these problems have "bang-bang" ⇒ ts =  2L   () character.  b 1- b / a  1/ 2 •For nonlinear and larger problems, the  2L  tf = (1 − b / a) variational conditions can be solved  b() 1 - b / a  numerically as boundary value problems.

25 Example: Batch reactor - temperature profile

Maximize yield of B after one hour’s operation by manipulating a transformed temperature, u(t). ⇒ Minimize -zB(1.0) s.t. z’A = -(u+u2/2) zA z’B = u zA u AB zA(0) = 1 2 zB(0) = 0 u /2 C 0 ” u(t) ” 5 u(T(t)) Adjoint Equations: λ 2 λ H = - A(u+u /2) zA + B u zA ∂ ∂ λ λ H/ u = A (1+u) zA + B zA λ λ λ λ ’A = A(u+u2/2) - B u, A(1.0) = 0 λ λ ’B = 0, B(1.0) = -1 Cases Considered 1. NLP Approach - piecewise constant and linear profiles. 2. Control Vector Iteration

26 Batch Reactor Optimal Temperature Program Piecewise Constant

6

4

2 Optimal Optimal Profile, u(t)

0. 0.2 0.4 0.6 0.8 1.0 Time, h Results Piecewise Constant Approximation with Variable Time Elements Optimum B/A: 0.57105

27 Batch Reactor Optimal Temperature Program Piecewise Linear

6

4

2 Optimal Profile, u(t)

0. 0.2 0.4 0.6 0.8 1.0 Time, h Results: Piecewise Linear Approximation with Variable Time Elements Optimum B/A: 0.5726 Equivalent # of ODE solutions: 32

28 Batch Reactor Optimal Temperature Program Indirect Approach

6

4

2 Optimal Profile, u(t) Profile, Optimal

0. 0.2 0.4 0.6 0.8 1.0 Time, h Results: Control Vector Iteration with Conjugate Gradients Optimum (B/A): 0.5732 Equivalent # of ODE solutions: 58

29 Dynamic Optimization - Sequential Strategies

Small NLP problem, O(np+nu) (large-scale NLP solver not required) • Use NPSOL, NLPQL, etc. • Second derivatives difficult to get Repeated solution of DAE model and sensitivity/adjoint equations, scales with nz and np • Dominant computational cost • May fail at intermediate points Sequential optimization is not recommended for unstable systems. State variables blow up at intermediate iterations for control variables and parameters. Discretize control profiles to parameters (at what level?) Path constraints are difficult to handle exactly for NLP approach

30 Instabilities in DAE Models

This example cannot be solved with sequential methods (Bock, 1983):

dy1/dt = y2 τ2 π2 − τ2 π dy2/dt = y1 + ( ) sin ( t) The characteristic solution to these equations is given by: π τ τ y1(t) = sin ( t) + c1 exp(- t) + c2 exp( t) π π τ τ τ τ y2 (t) = cos ( t) - c1 exp(- t) + c2 exp( t)

Both c1 and c2 can be set to zero by either of the following equivalent conditions: π IVP y1(0) = 0, y2 (0) =

BVP y1(0) = 0, y1(1) = 0

31 IVP Solution

If we now add roundoff errors e1 and e2 to the IVP and BVP conditions, we see significant differences in the sensitivities of the solutions. For the IVP case, the sensitivity to the analytic solution profile is seen by large changes in the profiles y1(t) and y2(t) given by:

π τ τ y1(t) = sin ( t) + (e1 - e2/ ) exp(- t)/2 τ τ +(e1 + e2/ ) exp( t)/2

π π τ τ y2 (t) = cos ( t) - ( e1 - e2) exp(- t)/2 τ τ + ( e1 + e2) exp( t)/2

-13 Therefore, even if e1 and e2 are at the level of machine precision (< 10 ), a large value of τ and t will lead to unbounded solution profiles.

32 BVP Solution

On the other hand, for the boundary value problem, the errors affect the analytic solution profiles in the following way: π τ τ τ τ y1(t) = sin ( t) + [e1 exp( )- e2] exp(- t)/[exp( ) - exp(- )] τ τ τ τ + [e1 exp(- ) - e2] exp( t)/[exp( ) - exp(- )] π π τ τ τ τ τ y2(t) = cos ( t) – [e1 exp( )- e2] exp(- t)/[exp( ) - exp(- )] τ τ τ τ τ + [e1 exp(- ) - e2] exp( t)/[exp( ) - exp(- )]

Errors in these profiles never exceed t (e1 + e2), and as a result a solution to the BVP is readily obtained.

33 BVP and IVP Profiles

-9 e1, e2 = 10 Linear BVP solves easily IVP blows up before midpoint

34 DynamicDynamic OptimizationOptimization ApproachesApproaches

Pontryagin(1962)

DAE Optimization Problem Variational Approach

Inefficient for constrained problems Discretize Vassiliadis(1994) controls Apply a NLP solver Sequential Approach

Efficient for constrained problems Can not handle instabilities properly Small NLP Discretize some state variables Multiple Shooting

Handles instabilities Larger NLP

35 Multiple Shooting for Dynamic Optimization Divide time domain into separate regions

Integrate DAEs state equations over each region

Evaluate sensitivities in each region as in sequential approach wrt uij, p and zj Impose matching constraints in NLP for state variables over each region Variables in NLP are due to control profiles as well as initial conditions in each region

36 Multiple Shooting Nonlinear Programming Problem

ψ ( ) min z(t f ), y(t f ) ui , j , p s.t. − = z(z j ,u i, j , p, t j +1 ) z j +1 0 l ≤ ≤ u z k z(z j , u i, j , p, t k ) z k y l ≤ y(z , u , p,t ) ≤ y u min f (x) k j i, j k k x∈ℜn l ≤ ≤ u u i u i, j u i p l ≤ p ≤ p u s.t c(x) = 0   dz = () = f z, y,u , ji , p , z(tj ) zj  dt xL ≤ x ≤ xu ( )= g z,y,u ji, , p 0 o z = z (0) 0 Solved Implicitly

37 BVP Problem Decomposition

IC

B1 A1

B2 A2

B3 A3

B4 A4

BN AN

FC

Consider: Jacobian of Constraint Matrix for NLP • bound unstable modes with boundary conditions (dichotomy) • can be done implicitly by determining stable pivot sequences in multiple shooting constraints approach • well-conditioned problem implies dichotomy in BVP problem (deHoog and Mattheij) Bock Problem (with t = 50) • Sequential approach blows up (starting within 10-9 of optimum) • Multiple Shooting optimization requires 4 SQP iterations 38 Dynamic Optimization – Multiple Shooting Strategies

Larger NLP problem O(np+nu+NE nz) • Use SNOPT, MINOS, etc. • Second derivatives difficult to get Repeated solution of DAE model and sensitivity/adjoint equations, scales with nz and np • Dominant computational cost • May fail at intermediate points Multiple shooting can deal with unstable systems with sufficient time elements. Discretize control profiles to parameters (at what level?) Path constraints are difficult to handle exactly for NLP approach Block elements for each element are dense! Extensive developments and applications by Bock and coworkers using MUSCOD code

39 DynamicDynamic OptimizationOptimization ApproachesApproaches

Pontryagin(1962)

DAE Optimization Problem Variational Approach

Inefficient for constrained problems Discretize Vassiliadis(1994) controls Apply a NLP solver Sequential Approach

Efficient for constrained problems Can not handle instabilities properly Small NLP Discretize all state variables Simultaneous Approach

Handles instabilities Large NLP

40 Nonlinear Programming Formulation

Nonlinear Dynamic Continuous variables Optimization Problem Continuous variables

Collocation on finite Elements

Nonlinear Programming Discretized variables Problem (NLP)

41 Discretization of Differential Equations Orthogonal Collocation

Given: dz/dt = f(z, u, p), z(0)=given

Approximate z and u by Lagrange interpolation polynomials (order

K+1 and K, respectively) with interpolation points, tk K K (t − t ) = " " = ∏ j ==> = zK +1(t) ∑ zk k (t ,) k (t) zN +1(tk ) zk = − k =0 j 0 (t t ) j≠k k j K K (t − t ) = " " = ∏ j ==> = uK (t) ∑uk k (t ,) k (t) uN (tk ) uk = − k=1 j 1 (t t ) j≠k k j

Substitute zN+1 and uN into ODE and apply equations at tk. K = " − = = r(tk ) ∑ z j j (tk ) f (zk ,uk ) ,0 k 1,...K j=0

42 Collocation Example

K K (t − t ) = " " = ∏ j ==> = zK +1(t) ∑ zk k (t ,) k (t) zN +1(tk ) zk = − k =0 j 0 (t t ) j≠k k j = = = t 0 0 , t 1 0.21132 , t 2 0.78868

" = 2 + " = 0(t) (t - t 1 )/ ,6 0(t) t/ 3 - 1/ 6 " = + " = 1(t) -8.195 t 2 6 .4483 t , 1(t) 6 .4483 - 16 .39 t " = " = 2(t) 2.19625 t 2 - 0 .4641 t , 2(t) 4 .392 t - 0 .46412

Solve z' = z 2 - 3 z + 2 , z( 0 ) = 0 ==> = z 0 0 " + " + " = 2 + z 0 0(t 1 ) z 1 1(t 1 ) z 2 2(t 1 ) z 1 - 3 z 1 2 + = 2 + ( 2 .9857 z 1 0 .46412 z 2 z 1 - 3 z 1 2 ) " + " + " = 2 + z 0 0(t 2 ) z 1 1(t 2 ) z 2 2(t 2 ) z 2 - 3 z 2 2 + = 2 + (- 6 .478 z 1 3 z 2 z 2 - 3 z 2 2 ) = = = z 0 0 , z 1 0 .291 ( 0 .319 ), z 2 0 .7384 ( 0 .706 ) z(t) = 1.5337 t - 0.76303 t 2 43 Converted Optimal Control Problem Using Collocation

Min φ(z(tf))

s.t. z' = f(z, u, p), z(0)=z0 g(z(t), u(t), p) ” 0 z N+1(t)

h(z(t), u(t), p) = 0 z(t) to Nonlinear Program StateProfile φ Min (z f ) K  " − = = ∑ z j j (tk ) f (zk ,uk ) ,0 z0 z(0) = j 0  g(z ,u ) ≤ 0 k =1,...K k k t f t 1 t 2 t 3 =  h(zk ,uk ) 0   r(t) K " − = ∑ z j j )1( z f 0 t 3 j=0 t 1 t 2

How accurate is approximation

44 Results of Optimal Temperature Program Batch Reactor (Revisited)

Results - NLP with Orthogonal Collocation Optimum B/A - 0.5728 # of ODE Solutions - 0.7 (Equivalent)

45 CollocationCollocation onon FiniteFinite ElementsElements Polynomials i−1 = + τ τ ∈ tij ∑hi’ hi j , ]1,0[ = • • • • i’ 1 • • • • • • dz = 1 dz • • τ • dt hi d uuu u u u u u u u u u dz = hi f (z,u) t t tf dτ o i Collocation points hi Mesh points Discontinuous Algebraic and Finite element, i Control variables Continuous Differential variables u u u u u u element i q = 2 u u u q = 1 u u u K K = " = " y(t) ∑ q(t) yiq u(t) ∑ q(t) uiq K = = = " q 1 q 1 z(t) ∑ q(t) ziq q=0 K = " τ − = = = r(tik ) ∑(zij j ( k )) hi f (zik ,uik , p) ,0 k 1,..K, i 1,.. NE j=0

46 NonlinearNonlinear ProgrammingProgramming ProblemProblem

min ψ (z ) f min f (x) ∈ℜn K x s.t. " τ − = ∑(zij j ( k )) hi f (zik ,uik , p) 0 j=0 s.t c(x) = 0 ( )= g zi,k , yi,k ,ui,k , p 0

K " − = = L u ∑(zi− ,1 j j 1( )) zi0 ,0 i 2,..NE x ≤ x ≤ x j=0 K " − = = ∑(zNE, j j 1( )) z f ,0 z10 z )0( j=0

l u Finite elements, h , can also be variable to z ≤ z ≤ z i ji, i, j ji, determine break points for u(t). y l ≤ y ≤ y u i, j i, j i, j • Σ Add hu • hi 0, hi=tf u l ≤ u ≤ u u i, j i, j i, j Can add constraints g(h, z, u) ” ε for p l ≤ p ≤ p u approximation error 47 Hot Spot Reactor Revisited L Φ = − − Min L ∫ (T (t) TS /TR )dt T ,T ,L,T P R S 0 dq ts .. = 1(3.0 − q(t))exp[20 − 20 /T (t)], q )0( = 0 dt dT dq = − (5.1 T (t) −T /T ) + 3/2 , T )0( =1 dt S R dt o ∆ = + feed(TR , 110 C) - H product(TP , T(L)) 0 = o = + o TP 120 C, T(L) 1 10 C/TR

TP = specified product temperature TR = reactor inlet, reference temperature T R 3:1 B/A A + 3B --> C + 3D L = reactor length 383 K L Ts = steam sink temperature q(t) = reactor conversion profile T P T s T(t) = normalized reactor temperature profile

Cases considered: • Hot Spot - no state variable constraints • Hot Spot with T(t) ” 1.45

48 Base Case Simulation

Method: OCFE at initial point with 6 equally spaced elements L(norm) TR(K) TS(K) TP(K) Base Case: 1.0 462.23 425.26 250

2 1.8

integrated profile integrated profile collocation collocation 1.6

1 1.4 Conversion Temperature 1.2

0 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2

Normalized Length Normalized Length

49 Unconstrained Case Method: OCFE combined formulation with rSQP identical to integrated profiles at optimum L(norm) TR(K) TS(K) TP(K) Initial: 1.0 462.23 425.26 250 Optimal: 1.25 500 470.1 188.4

123 CPU s. (µVax II) φ∗ = -171.5

1.2 1.6

1.0 1.5

0.8 1.4

0.6 1.3

0.4 1.2 Conversion, q Conversion,

0.2 1.1 Normalized Temperature 0.0 1.0 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 Normalized Length Normalized Length

50 Temperature Constrained Case T(t) ” 1.45

Method: OCFE combined formulation with rSQP, identical to integrated profiles at optimum

L(norm) TR(K) TS(K) TP(K) Initial: 1.0 462.23 425.26 250 Optimal: 1.25 500 450.5 232.1

57 CPU s. (µVax II), φ∗ = -148.5

1.2 1.5

1.0 1.4 0.8 1.3 0.6 1.2

Conversion 0.4 Temperature

0.2 1.1

0.0 1.0 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 Normalized Length Normalized Length

51 Theoretical Properties of Simultaneous Method

A. Stability and Accuracy of Orthogonal Collocation

• Equivalent to performing a fully implicit Runge-Kutta integration of the DAE models at Gaussian (Radau) points • 2K order (2K-1) method which uses K collocation points • Algebraically stable (i.e., possesses A, B, AN and BN stability)

B. Analysis of the Optimality Conditions

• An equivalence has been established between the Kuhn-Tucker conditions of NLP and the variational necessary conditions • Rates of convergence have been established for the NLP method

52 Simultaneous DAE Optimization

Case Studies • Reactor - Based Flowsheets • Fed-Batch Penicillin Fermenter • Temperature Profiles for Batch Reactors • Parameter Estimation of Batch Data • Synthesis of Reactor Networks • Batch Crystallization Temperature Profiles • Grade Transition for LDPE Process • Ramping for Continuous Columns • Reflux Profiles for Batch Distillation and Column Design • Source Detection for Municipal Water Networks • Air Traffic Conflict Resolution • Satellite Trajectories in Astronautics • Batch Process Integration • Optimization of Simulated Moving Beds

53 Production of High Impact Polystyrene (HIPS) Startup and Transition Policies (Flores et al., 2005a)

Monomer, Transfer/Term. agents

Coolant

Catalyst Polymer

54 Phase Diagram of Steady States

Transitions considered among all steady state pairs

600

650 1: Qi = 0.0015 3 A5 2: Qi = 0.0025 3: Qi = 0.0040Upper Steady−State

550 N3 600 System State

2

550 500 1 Medium Steady−State

500 450

1: Qcw = 10 450 2: Qcw = 1.0 Temperature (K) Temperature Temperature (K) 3: Qcw = 0.1 1 400 N2 A1 Lower Steady−State 2 400 A2 N2

3 350 A3 350 C1 A4 N1 Bifurcation3 ParameterB1 C2 2 B2 B3 1

300 300

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.02 0.04 0.06 0.08 0.1 0.12 Cooling water flowrate (L/s) Initiator flowrate (L/s) 55 Startup to Unstable Steady State

−3 x 10 1 10

8 0.5 6

0 4 Initiator Conc. [mol/l] 0 0.5 1 1.5 2 0 0.5 1 1.5 2 Monomer Conc. [mol/l] 400 Time [h] 320 Time [h]

310 350 300

300 Jacket Temp. [K] 290 Reactor Temp. [K] 0 −3 0.5 1 1.5 2 0 0.5 1 1.5 2 x 10 1.5 Time [h] 1 Time [h]

1 0.5 0.5

0 0 Initiator Flow. [l/sec] 0 20 40 60 80 100 0 0.5 1 1.5 2

3 Time [h] Cooling water Flow. [l/sec] Time [h]

2 • 926 variables

1 • 476 constraints 0 • 36 iters. / 0.95 CPU s (P4) Feedrate Flow. [l/sec] 0 0.5 1 1.5 2 Time [h] 56 HIPS Process Plant (Flores et al., 2005b)

•Many grade transitions considered with stable/unstable pairs •1-6 CPU min (P4) with IPOPT •Study shows benefit for sequence of grade changes to achieve wide range of grade transitions.

57 Batch Distillation – Optimization Case Study - 1

R(t) D(t), x d dxi,d V   [ ] =  yi,N - xi,d  dt Hcond

dxi,0 V   = [] x - x  dt S  i,d i,0  dS - V = V(t) dt R +1 x b

*DXJHHIIHFWRIFROXPQKROGXSV 2YHUDOOSURILWPD[LPL]DWLRQ 0DNH7UD\&RXQW&RQWLQXRXV

58 2SWLPL]DWLRQ&DVH6WXG\ 

0RGHOLQJ$VVXPSWLRQV

,GHDO7KHUPRG\QDPLFV 1R/LTXLG7UD\+ROGXS 1R9DSRU+ROGXS )RXUFRPSRQHQWPL[WXUH α  6KRUWFXWVWHDG\VWDWHWUD\PRGHO )HQVNH8QGHUZRRG*LOOLODQG &DQEHVXEVWLWXWHGE\PRUHGHWDLOHGVWHDG\VWDWHPRGHOV )UHGHQVOXQG DQG*DOLQGH]$OND\D

2SWLPL]DWLRQ3UREOHPV&RQVLGHUHG

(IIHFWRI&ROXPQ+ROGXS +FRQG 7RWDO3URILW0D[LPL]DWLRQ 59 0D[LPXP'LVWLOODWH3UREOHP

30

20

With Holdup No Holdup 10 Reflux Ratio Reflux &RPSDULVRQRIRSWLPDOUHIOX[SURILOHV ZLWKDQGZLWKRXWKROGXS +FRQG  0 0.0 0.2 0.4 0.6 0.8 1.0 Time ( hours )

0.98

0.97 &RPSDULVRQRIGLVWLOODWH 0.96 0.95 SURILOHVZLWKDQGZLWKRXW With Holdup 0.94 No Holdup KROGXS +FRQG DW 0.93

RYHUDOOSXULW\ 0.92 Distillate Composition Distillate

0.91 0.0 0.2 0.4 0.6 0.8 1.0 Time ( hours ) 60 %DWFK'LVWLOODWLRQ3URILW0D[LPL]DWLRQ

Max {Net Sales(D, S0)/(tf +Tset) – TAC(N, V)}

N = 20 trays, Tsetup = 1 hour xd = 0.98, xfeed = 0.50, α = 2 Cprod/Cfeed = 4.1

V = 120 moles/hr, S0 = 100 moles.

30

20

N = 20 N = 33.7 10 Reflux Ratio

0 0 1 2 3 4

Time ( hours ) 61 %DWFK'LVWLOODWLRQ 2SWLPL]DWLRQ&DVH6WXG\ 

dx1,N +1 V R(t) = []y1,N - x1,N +1 D(t), x d dt H N+1

dx1,p V R   = []y - y , +  x - x  , p = 1,...,N 1, p-1 1 p R+1  1,p+1 1,p  dt Hp

dx V   1,0 = []x - y + R  x - x  dt S 1,0 1,0 R+1  1,1 1,0  dD V = V(t) dt R +1

x b  N+1  N+1 0 0  0  S xi,0 = S -H∑ p xi,0 +∑ Hp xi,p  p=1  p=1 C C ∑ xi,p = 1.0 ∑ yi,p =1.0 1 1

Ideal Equilibrium Equations yi,p = Ki,p xi,p Binary Column (55/45, Cyclohexane, Toluene) S0 = 200, V = 120, Hp = 1, N = 10, ~8000 variables, < 2 CPU hrs. (Vaxstation 3200)

62 2SWLPL]DWLRQ&DVH6WXG\ 

0RGHOLQJ$VVXPSWLRQV

,GHDO7KHUPRG\QDPLFV &RQVWDQW7UD\+ROGXS 1R9DSRU+ROGXS %LQDU\0L[WXUH WROXHQHF\FORKH[DQH KRXURSHUDWLRQ 7RWDO5HIOX[,QLWLDO&RQGLWLRQ

&DVHV&RQVLGHUHG

&RQVWDQW&RPSRVLWLRQRYHU7LPH 6SHFLILHG&RPSRVLWLRQDW)LQDO7LPH %HVW&RQVWDQW5HIOX[3ROLF\ 3LHFHZLVH&RQVWDQW5HIOX[3ROLF\ 63 5HIOX[2SWLPL]DWLRQ&DVHV

6 0.006

5 0.005 &RQVWDQW3XULW\RYHU7LPH 4 0.004 -x)

3 0.003

 • 2 0.002 [ W   Reflux Reflux Policy Reflux Purity I 1 0.001 Top Plate Purity, (1 ' W  

0 0.000 0.0 0.2 0.4 0.6 0.8 1.0 Time (hrs.)

8 0.010

Reflux

Purity 0.008 2YHUDOO'LVWLOODWH3XULW\ 6 ∫ [G W 9 5 GW ' WI •  0.006 4 I ' W   0.004 Reflux Reflux Policy 2 0.002 Distillate Purity, (1-x) 6KRUWFXW&RPSDULVRQ 0 0.000 I 0.0 0.2 0.4 0.6 0.8 1.0 ' W   Time (hrs.) 64 5HIOX[2SWLPL]DWLRQ&DVHV

0.30 0.004

Reflux

0.28 Purity 0.003 &RQVWDQW5HIOX[RYHU7LPH 0.26

0.002 0.24 [G W 9 5 GW ' WI •  Reflux Reflux Policy 0.001 I 0.22 Distillate Purity, (1-x) ' W  

0.20 0.000 0.0 0.2 0.4 0.6 0.8 1.0 Time (hrs.)

0.010 6

5 0.008 Purity 3LHFHZLVH&RQVWDQW Reflux 4 5HIOX[RYHU7LPH 0.006 3

0.004 2 ∫ • RefluxPolicy

G I (1-x)Purity, Distillate [ W 9 5 GW ' W   0.002 1 ' WI   0.000 0 0.0 0.2 0.4 0.6 0.8 1.0 Time (hrs.) 65 Batch Reactive Distillation – Case Study 3

Reversible reaction between acetic acid and ethanol

l CH3COOH + CH3CH2OH CH3COOCH2CH3 + H2O

t = 0, x = 0.25 for all components

R(t) t ==1 D(t), x d tff 1 maxmax∫∫ Ddt Ddt 00

s.t. DAE

x EsterEster ≥≥0. 4600 xDD 0. 4600

V(t)

x b

Wajde & Reklaitis (1995)66 2SWLPL]DWLRQ&DVH6WXG\ 

0RGHOLQJ$VVXPSWLRQV

,GHDO7KHUPRG\QDPLFV &RQVWDQW7UD\+ROGXS 1R9DSRU+ROGXS 7HUWLDU\0L[WXUH (W2++2$F(7$F+2 &ROG6WDUW,QLWLDO&RQGLWLRQ

&DVHV&RQVLGHUHG

6SHFLILHG&RPSRVLWLRQDW)LQDO7LPH 2SWLPXP5HIOX[3ROLF\ 9DULRXV7UD\V&RQVLGHUHG  KRXURSHUDWLRQ

67 Batch Reactive Distillation

Optimal Reflux Profiles Condenser Composition (8 trays) 100 0.50

80 0.40 60 0.30

40 0.20 M.fraction 0.10 20 Reflux Ratio 0.00 0.0 0.2 0 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0 time(hr) 1.0 time (hr) Ester Eth. Wat. A. A. 8 trays 15 trays 25 trays

Distillate Composition

YDULDEOHV 0.45 '$(V

GHJUHHVRIIUHHGRP 0.35 ILQLWHHOHPHQWV Ethyl Acetate ,3237LWHUDWLRQV 0.25 &38PLQXWHV 0.0 0.2 0.4 0.6 0.8 1.0 time (hr) 8 trays 15 trays 25 trays 68 Batch Reactive Distillation

Discretized CPU (s) Trays DAEs Iterations Variables Global Elemental

8 98 1788 14 56.4 37.2

15 168 3048 32 245.7 207.5

25 258 4678 45 1083.2 659.3

CPU Decomposition Time

21

16

11

CPU(s)/iter 6

1 90 130 170 210 250 DAEs 69 Nonlinear Model Predictive Control (NMPC)

NMPC Estimation and Control

z : differential states Why NMPC? d : disturbances y : algebraic states Process  Track a profile  Severe nonlinear dynamics (e.g, sign changes in gains) u : manipulated variables NMPC Controller  Operate process over wide range z′ = F(z, y,u, p,d ) (e.g., startup and shutdown) 0 = G()z, y,u, p,d

ysp : set points Model Updater ′ ( ) z = F z, y,u, p,d NMPC Subproblem 0 = G()z, y,u, p,d − sp + k − k −1 min ∑|| y(t) y || y ∑|| u(t ) u(t ||) u u Q Q z′(t) = F (z(t), y(t), u(t),t) ts .. 0 = G(z(t), y(t), u(t),t) z(t) = z init Bound Constraint s Other Constraint s 70 Dynamic optimization in a MATLAB Framework

Dynamic Optimization Dynamic Process NLP Optimization Problem Discretization Problem Method No. of Time Process Model Process Model JOBKENNUNG : C:\Krawal-modular\IGCC\IGCC_Puertollano_komplett.gek Elements f (x′,x,y,u,p,t)= 0 Collocation ˆ ( )= f xˆ,yˆ,uˆ,p,t f ,x0 0 Order g(x, y,u, p,t)= 0 ( )= gˆ xˆ,yˆ,uˆ,p,t f 0 Inequality Constraints ′ ≤ Inequality Constraints bar kJ/kg kg/s °C (X) Saturator-System h(x ,x,y,u,p t), 0 Alle Drücke sind absolut P..Druck..bar T..Temperatur..°C copy ofDesign H..Enthalpie..kJ/kg Wärmeschaltplan M..Massenstrom..kg/s SIEMENS AG In Bearbeitung Nr. -F Ref T - PHI..Luft-Feuchte..% Erlangen, 13.Oct.1999 F Ref T hˆ (xˆ,yˆ,uˆ,p,t ) ≤ 0 Full f Initial Conditions Discretization Constraints at Final Time x (t ) = x 0 0 of State and ϕ (x , y ,u ,p,t )= 0 Control Nt Nt Nt f Constraints at Final Time Variables ϕ ( ′ )= Objective Function x (t f ,) x(t f ,) y(t f ,) u(t f ,) x 0 ,p,t f 0 min P(x , y ,u ,p,t ) Nt Nt Nt f Objective Function P(x(t ,) y(t ,) u(t ),p,x ,t ) min f f f 0 f u(t),p,x0 ,t f

71 Tennessee Eastman Process

Unstable Reactor 11 Controls; Product, Purge streams Model extended with energy balances 72 Tennessee Eastman Challenge Process

DAE Model NLP Optimization problem

Number of differential equations 30 Number of variables 10920 of which are fixed 0 Number of algebraic variables 152 Number of constraints 10260 Number of lower bounds 780 Number of algebraic equations 141 Number of upper bounds 540 Difference (control variables) 11 Number of nonzeros in Jacobian 49230 Number of nonzeros in Hessian 14700

Method of Full Discretization of State and Control Variables Large-scale Sparse block-diagonal NLP

73 Setpoint change studies

Process variable Type Magnitude

-15% Make a step change to the variable(s) used to set Production rate change Step the process production rate so that the product flow leaving the stripper column base changes from 14,228 to 12,094 kg h-1 -60 kPa Reactor operating pressure Step Make a step change so that the reactor operating change pressure changes from 2805 to 2745 kPa +2% Purge gas composition of Make a step change so that the composition of Step component B change component B in the gas purge changes from 13.82 to 15.82%

Setpoint changes for the base case [Downs & Vogel]

74 Case Study: Change Reactor pressure by 60 kPa

Control profiles

All profiles return to their base case values Same production rate Same product quality Same control profile Lower pressure – leads to larger gas phase (reactor) volume Less compressor load

75 TE Case Study – Results I

Shift in TE process

Same production rate More volume for reaction Same reactor temperature Initially less cooling water flow (more evaporation)

76 Case Study- Results II

Shift in TE process

Shift in reactor effluent to more condensables Increase cooling water flow Increase stripper steam to ensure same purity Less compressor work

77 Case Study: Change Reactor Pressure by 60 kPa

Optimization with IPOPT 1000 Optimization Cycles 5-7 CPU seconds 11-14 Iterations Optimization with SNOPT Often failed due to poor conditioning Could not be solved within sampling times > 100 Iterations

78 OptimizationOptimization asas aa FrameworkFramework forfor IntegrationIntegration

+ Directly handles interactions, multiple conditions + Trade-offs unambiguous, quantitative

- Larger problems to solve - Need to consider a diverse process models

Research Questions

How should diverse models be integrated? Is further algorithmic development needed?

79 Batch Integration Case Study

Plant Design

Dynamic Processing

R T

Time Time

Production Planning Stage 1 A B C

Stage 2 A B C

:KDWDUHWKH,QWHUDFWLRQVEHWZHHQ'HVLJQ DQG'\QDPLFVDQG3ODQQLQJ" :KDWDUHWKHGLIIHUHQFHVEHWZHHQ6HTXHQWLDODQG 6LPXOWDQHRXV6WUDWHJLHV" (VSHFLDOO\,PSRUWDQWLQ%DWFK6\VWHPV 80 6LPXOWDQHRXV'\QDPLF2SWLPL]DWLRQ

Dynamic Processing

Shorter Same conversion T in reduced time

Conv. processing times

Time Time

Fewer Higher conversion T in same time

Conv. product batches

Time Time Best transient Best constant

Production Planning Stage 1 A B C Shorter Planning Horizon Stage 2 A B C

GLVFUHWL]H '$(V VWDWHDQGFRQWUROSURILOHV ODUJHVFDOHRSWLPL]DWLRQSUREOHP KDQGOHVSURILOHFRQVWUDLQWVGLUHFWO\ LQFRUSRUDWHVHTXLSPHQWYDULDEOHVGLUHFWO\ '$(PRGHOVROYHGRQO\RQFH FRQYHUJHVIRUXQVWDEOHV\VWHPV 81 Scheduling Formulation

VHTXHQFLQJRIWDVNVSURGXFWVHTXLSPHQW H[SHQVLYHGLVFUHWHFRPELQDWRULDORSWLPL]DWLRQ FRQVLGHULGHDOWUDQVIHUSROLFLHV 8,6DQG=: FORVHGIRUPUHODWLRQV %LUHZDU DQG*URVVPDQQ

AB N stage I 8QOLPLWHG,QW6WRUDJH 8,6 stage 2 6KRUWSURGXFWLRQF\FOH &\FOHWLPHLQGHSHQGHQWRIVHTXHQFH

=HUR:DLW =:  ,PPHGLDWHWUDQVIHUUHTXLUHG

A B N 6ODFNWLPHVGHSHQGHQWRQSDLU stage I /RQJHUSURGXFWLRQF\FOHUHTXLUHG stage 2

82 CaseCase StudyStudy ExampleExample

4 stages, 3 products of different purity Dynamic reactor - temperature profile Dynamic column - reflux profile A + B → C C + B → P + E P + C → G z 0 z 0 0 z 0 B i,I i,II zi ,III i,IV i z f f f f i ,IV zi,I zi ,II zi ,III

Process Optimization Cases

SQ - Sequential Design - Scheduling - Dynamics SM - Simultaneous Design and Scheduling Dynamics with endpoints fixed . SM* - Simultaneous Design, Scheduling and Dynamics

83 6FHQDULRVLQ&DVH6WXG\

&RPSDULVRQRI'\QDPLFYV%HVW&RQVWDQW3URILOHV

5 EHVWFRQVWDQWWHPSHUDWXUHSURILOH 5 RSWLPDOWHPSHUDWXUHSROLF\

& EHVWFRQVWDQWUHIOX[UDWLR & RSWLPDOUHIOX[UDWLR

640 25 C o nsta nt C o nsta nt 630 Dyn a m ic Dyn a m ic 20 620

610 15

600 10 590

580 5 0 0.25 0.5 0.75 1 1.25 0 0.5 1 1.5 2 2.5 Tim e (h r.) 84 Results for Simultaneous Cases

2 0 0 0 0 R0/R1

1 5 0 0 0 best constant / optimal temperature

it it ($) 1 0 0 0 0

Pro f C0/C1

5 0 0 0 best constant / optimal reflux ratio

0 R0C0 R1C0 R0C1 R1C1

C a s e s

Sim ulta neo us SIm ulta ne ous S e q u e n t ia l w ith FIXED sta te s w ith FREE sta te s

I 64 II III IV I II 60 III IV I II 60 III IV

- ZW schedule becomes tighter

- less dependent on product sequences 85 Summary Sequential Approaches - Parameter Optimization • Gradients by: Direct and Adjoint Sensitivity Equations - Optimal Control (Profile Optimization) • Variational Methods • NLP-Based Methods - Require Repeated Solution of Model - State Constraints are Difficult to Handle

Simultaneous Approach - Discretize ODE's using orthogonal collocation on finite elements (solve larger optimization problem) - Accurate approximation of states, location of control discontinuities through element placement. - Straightforward addition of state constraints. - Deals with unstable systems Simultaneous Strategies are Effective - Directly enforce constraints - Solve model only once - Avoid difficulties at intermediate points

Large-Scale Extensions - Exploit structure of DAE discretization through decomposition

- Large problems solved efficiently with IPOPT 86 DAE Optimization Resources References Bryson, A.E. and Y.C. Ho, Applied Optimal Control, Ginn/Blaisdell, (1968). Himmelblau, D.M., T.F. Edgar and L. Lasdon, Optimization of Chemical Processes, McGraw-Hill, (2001). Ray. W.H., Advanced Process Control, McGraw-Hill, (1981).

Software • Dynamic Optimization Codes ACM – Aspen Custom Modeler DynoPC - simultaneous optimization code (CMU) COOPT - sequential optimization code (Petzold) gOPT - sequential code integrated into gProms (PSE) MUSCOD - multiple shooting optimization (Bock) NOVA - SQP and collocation code (DOT Products) • Sensitivity Codes for DAEs DASOLV - staggered direct method (PSE) DASPK 3.0 - various methods (Petzold) SDASAC - staggered direct method (sparse) DDASAC - staggered direct method (dense)

87 DynoPC – Windows Implementation

Model in FORTRAN Data/Config. Outputs/Graphic & Texts

Interface of DynoPC F2c/C++

ADOL_C Redued SQP Optimal Solution Preprocessor

F/DF Calc. of independent Interior Point variable move For QP DDASSL F/DF Decomposition COLDAE Line Search

Simulator/Discretizer Starting Point Linearization FILTER

88 Example: Batch Reactor Temperature

1 2 ABC

Max b(t f ) .ts . T da E = −k exp( −1 ) ⋅ a at 1 RT db E E =k exp( −1 ) ⋅a − k exp( −2 ) ⋅ b at 1 RT 2 RT

a + b + c = 1

89 Example: Car Problem

Min tf s.t. z1’ = z2 z2’ = u

z2 ” zmax -2 ” u ” 1

2 12

10 1

8 0 subroutine model(nz,ny,nu,np,t,z,dmz,y,u,p,f) double precision t, z(nz),dmz(nz), y(ny),u(nu),p(np) Velocity 6

Acceleration -1 Velocity double precision f(nz+ny) 4

Acceleration, u(t) Acceleration, f(1) = p(1)*z(2) - dmz(1) -2 2 f(2) = p(1)*u(1) - dmz(2)

-3 0 return 0 10 20 30 40 end Time

90 Example: Crystallizer Temperature

4.5E-03 60 4.0E-03 SUBROUTINE model(nz,ny,nu,np,x,z,dmz,y,u,p,f) 50 3.5E-03 implicit double precision (a-h,o-z) double precision f(nz+ny),z(nz),dmz(nz),Y(ny),yp(4),u(1) 3.0E-03 40 double precision kgr, ln0, ls0, kc, ke, kex, lau, deltT, alpha 2.5E-03 dimension a(0:3), b(0:3) 30 2.0E-03 data alpha/1.d-4/,a/-66.4309d0, 2.8604d0, -.022579d0, 6.7117d-5/, + b/16.08852d0, -2.708263d0, .0670694d0, -3.5685d-4/, kgr/ 4.18d-3/, 1.5E-03 20 + en / 1.1d0/, ln0/ 5.d-5/, Bn / 3.85d2/, em / 5.72/, ws0/ 2.d0/, T jacket (C) Crystal size 1.0E-03 + Ls0/ 5.d-4 /, Kc / 35.d0 /, Kex/ 65.d0/, are/ 5.8d0 /, 10 + amt/ 60.d0 /, V0 / 1500.d0/, cw0/ 80.d0/,cw1/ 45.d0/,v1 /200.d0/, 5.0E-04 + tm1/ 55.d0/,x6r/0.d0/, tem/ 0.15d0/,clau/ 1580.d0/,lau/1.35d0/, 0.0E+00 0 + cp/ 0.4d0 /,cbata/ 1.2d0/, calfa/ .2d0 /, cwt/ 10.d0/ 0 5 10 15 20 ke = kex*area Time (hr) x7i = cw0*lau/(100.d0-cw0) v = (1.d0 - cw0/100.d0)*v0 w = lau*v0 Crystal Size T jacket yp(1) = (deltT + dsqrt(deltT**2 + alpha**2))*0.5d0 yp(2) = (a(0) + a(1)*yp(4) + a(2)*yp(4)**2 + a(3)*yp(4)**3) TT TC yp(3) = (b(0) + b(1)*yp(4) + b(2)*yp(4)**2 + b(3)*yp(4)**3) deltT = yp(2) - z(8) yp(4) = 100.d0*z(7)/(lau+z(7)) Maximize crystal size f(1) = Kgr*z(1)**0.5*yp(1)**en - dmz(1) at final time f(2) = Bn*yp(1)**em*1.d-6 - dmz(2) f(3) = ((z(2)*dmz(1) + dmz(2) * Ln0)*1.d+6*1.d-4) - dmz(3) f(4) = (2.d0*cbata*z(3)*1.d+4*dmz(1)+dmz(2)*Ln0**2*1.d+6)-dmz(4) Steam f(5) = (3.d0*calfa*z(4)*dmz(1)+dmz(2)*Ln0**3*1.d+6) - dmz(5) f(6) = (3.d0*Ws0/(Ls0**3)*z(1)**2*dmz(1)+clau*V*dmz(5))-dmz(6) f(7) = -dmz(6)/V - dmz(7) Cooling f(8) = (Kc*dmz(6) - Ke*(z(8) - u(1)))/(w*cp) - dmz(8) water f(9) = y(1)+YP(3)- u(1) return end Control variable = Tjacket = f(t)?

91