Nonlinear Programming: Concepts, Algorithms and Applications
L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA Nonlinear Programming and Process Optimization
Introduction
Unconstrained Optimization • Algorithms • Newton Methods • Quasi-Newton Methods
Constrained Optimization • Karush Kuhn-Tucker Conditions • Special Classes of Optimization Problems • Reduced Gradient Methods (GRG2, CONOPT, MINOS) • Successive Quadratic Programming (SQP) • Interior Point Methods
Process Optimization • Black Box Optimization • Modular Flowsheet Optimization – Infeasible Path • The Role of Exact Derivatives
Large-Scale Nonlinear Programming • Data Reconciliation • Real-time Process Optimization
Further Applications • Sensitivity Analysis for NLP Solutions • Multiperiod Optimization Problems
Summary and Conclusions 2 Introduction
Optimization: given a system or process, find the best solution to this process within constraints. Objective Function: indicator of "goodness" of solution, e.g., cost, yield, profit, etc. Decision Variables: variables that influence process behavior and can be adjusted for optimization.
In many cases, this task is done by trial and error (through case study). Here, we are interested in a systematic approach to this task - and to make this task as efficient as possible.
Some related areas: - Math programming - Operations Research Currently - Over 30 journals devoted to optimization with roughly 200 papers/month - a fast moving field!
3 Optimization Viewpoints
Mathematician - characterization of theoretical properties of optimization, convergence, existence, local convergence rates. Numerical Analyst - implementation of optimization method for efficient and "practical" use. Concerned with ease of computations, numerical stability, performance. Engineer - applies optimization method to real problems. Concerned with reliability, robustness, efficiency, diagnosis, and recovery from failure.
4 Optimization Literature
Engineering 1. Edgar, T.F., D.M. Himmelblau, and L. S. Lasdon, Optimization of Chemical Processes, McGraw-Hill, 2001. 2. Papalambros, P. and D. Wilde, Principles of Optimal Design. Cambridge Press, 1988. 3. Reklaitis, G., A. Ravindran, and K. Ragsdell, Engineering Optimization, Wiley, 1983. 4. Biegler, L. T., I. E. Grossmann and A. Westerberg, Systematic Methods of Chemical Process Design, Prentice Hall, 1997.
Numerical Analysis 1. Dennis, J.E. and R. Schnabel, Numerical Methods of Unconstrained Optimization, Prentice-Hall, (1983), SIAM (1995) 2. Fletcher, R. Practical Methods of Optimization, Wiley, 1987. 3. Gill, P.E, W. Murray and M. Wright, Practical Optimization, Academic Press, 1981. 4. Nocedal, J. and S. Wright, Numerical Optimization, Springer, 1998
5 Motivation
Scope of optimization Provide systematic framework for searching among a specified space of alternatives to identify an “optimal” design, i.e., as a decision-making tool
Premise Conceptual formulation of optimal product and process design corresponds to a mathematical programming problem
min f (x, y) st h(x, y) = 0 g(x, y)≤ 0 x∈ R n y ∈ ,0{ }1 ny
6 Optimization in Design, Operations and Control
MILP MINLP Global LP,QP NLP SA/GA
HENS x x x x x x MENS x x x x x x Separations x x Reactors x x x x Equipment x x x Design Flowsheeting x x Scheduling x x x x Supply Chain x x x Real-time x x optimization Linear MPC x Nonlinear x x MPC Hybrid x x
7 Unconstrained Multivariable Optimization
Problem: Min f(x) (n variables)
Equivalent to: Max -f(x), x ∈ Rn
Nonsmooth Functions - Direct Search Methods - Statistical/Random Methods
Smooth Functions - 1st Order Methods - Newton Type Methods - Conjugate Gradients
8 Example: Optimal Vessel Dimensions What is the optimal L/D ratio for a cylindrical vessel? Constrained Problem π D2 (1) π Min CT + CS DL = cost 2 πDL2 s.t. V - = 0 4 Convert to Unconstrained (Eliminate L) π D2 4V Min CT + CS = cost 2 D
d(cost) 4VCs = CTπ D - = 0 (2) dD D2
1/ 3 1/3 2/3 4V 4V CS C T D = L = π CT π CS
==> L/D = CT/CS Note: - What if L cannot be eliminated in (1) explicitly? (strange shape) - What if D cannot be extracted from (2)? (cost correlation implicit) 9 Two Dimensional Contours of F(x)
Convex Function Nonconvex Function Multimodal, Nonconvex
Discontinuous Nondifferentiable (convex)
10 Local vs. Global Solutions
•Find a local minimum point x* for f(x) for feasible region defined by constraint functions: f(x*) f(x) for all x satisfying the constraints in some neighborhood around x* (not for all x ∈ X) •Finding and verifying global solutions will not be considered here. •Requires a more expensive search (e.g. spatial branch and bound). •A local solution to the NLP is also a global solution under the following sufficient conditions based on convexity. • f(x) is convex in domain X, if and only if it satisfies: f(α y + (1-α) z) α f(y) + (1-α)f(z) for any α, 0 α 1, at all points y and z in X.
11 Linear Algebra - Background
Some Definitions • Scalars - Greek letters, α, β, γ • Vectors - Roman Letters, lower case • Matrices - Roman Letters, upper case • Matrix Multiplication: ∈ ℜn x m ∈ ℜm x p ∈ ℜn x p Σ C = A B if A , B and C , Cij = k Aik Bkj • Transpose - if A ∈ ℜn x m, interchange rows and columns --> AT∈ ℜm x n • Symmetric Matrix - A ∈ ℜn x n (square matrix) and A = AT • Identity Matrix - I, square matrix with ones on diagonal and zeroes elsewhere. • Determinant: "Inverse Volume" measure of a square matrix Σ det(A) = i (-1)i+j Aij Aij for any j, or Σ det(A) = j (-1)i+j Aij Aij for any i, where Aij is the determinant of an order n-1 matrix with row i and column j removed. det(I) = 1
• Singular Matrix: det (A) = 0 12 Linear Algebra - Background
Gradient Vector -(∇f(x)) ∂ f / ∂x1 ∂ f / ∂x 2 ∇f = ...... ∂ f / ∂x n
Hessian Matrix (∇2f(x) - Symmetric)
∂ 2 f ∂ 2 f ∂ 2 f ⋅ ⋅ ⋅ ⋅ 2 ∂x1 ∂x1∂x2 ∂x1 ∂xn ∇2 f(x) = ...... ∂ 2 f ∂ 2 f ∂ 2 f ⋅ ⋅ ⋅ ⋅ 2 ∂xn ∂x1 ∂xn ∂x2 ∂xn
∂ 2 f ∂2f Note: = ∂ ∂ ∂ ∂ xi x j xj xi
13 Linear Algebra - Background • Some Identities for Determinant det(A B) = det(A) det(B); det (A) = det(AT) α αn Π λ det( A) = det(A); det(A) = i i(A)
• Eigenvalues: det(A- λ I) = 0, Eigenvector: Av = λ v Characteristic values and directions of a matrix. For nonsymmetric matrices eigenvalues can be complex, so we often use singular values, σ = λ(ATΑ)1/2 ≥ 0 • Vector Norms Σ p 1/p || x ||p = { i |xi| } ∞ (most common are p = 1, p = 2 (Euclidean) and p = (max norm = maxi|xi|)) • Matrix Norms ||A|| = max ||A x||/||x|| over x (for p-norms) Σ ||A||1 - max column sum of A, maxj ( i |Aij|)
∞ Σ ||A|| - maximum row sum of A, maxi ( j |Aij|) ||A||2 = [σmax(Α)](spectral radius)
Σ Σ 2]1/2 ||A||F = [ i j (Aij) (Frobenius norm) -1 κ(Α) = ||A|| ||A || (condition number) = σmax/σmin (using 2-norm) 14 Linear Algebra - Eigenvalues λ λ Find v and where Avi = i vi, i = i,n Note: Av - λv = (A - λI) v = 0 or det (A - λI) = 0 For this relation λ is an eigenvalue and v is an eigenvector of A.
λ If A is symmetric, all i are real λ i > 0, i = 1, n; A is positive definite λ i < 0, i = 1, n; A is negative definite λ i = 0, some i: A is singular Quadratic Form can be expressed in Canonical Form (Eigenvalue/Eigenvector) xTAx ⇒ A V = V Λ V - eigenvector matrix (n x n) Λ λ - eigenvalue (diagonal) matrix = diag( i)
λ -1 T If A is symmetric, all i are real and V can be chosen orthonormal (V = V ). Thus, A = V Λ V-1 = V Λ VT For Quadratic Function: Q(x) = aTx + ½ xTAx Define: z = VTx and Q(Vz) = (aTV) z + ½ zT (VTAV)z = (aTV) z + ½ zT Λ z
λ -1 Λ-1 T Minimum occurs at (if i > 0) x = -A a or x = Vz = -V( V a) 15 Positive (or Negative) Curvature Positive (or Negative) Definite Hessian
Both eigenvalues are strictly positive or negative • A is positive definite or negative definite • Stationary points are minima or maxima
16 Zero Curvature Singular Hessian
One eigenvalue is zero, the other is strictly positive or negative • A is positive semidefinite or negative semidefinite • There is a ridge of stationary points (minima or maxima)
17 Indefinite Curvature Indefinite Hessian
One eigenvalue is positive, the other is negative • Stationary point is a saddle point • A is indefinite
Note: these can also be viewed as two dimensional projections for higher dimensional problems
18 Eigenvalue Example
T 1 1 2 1 Min Q(x) = x + xT x 1 2 1 2 2 1 AV = VΛ with A = 1 2 1 0 1/ 2 1/ 2 V T AV = Λ = with V = 0 3 -1/ 2 1/ 2
• All eigenvalues are positive • Minimum occurs at z* = -/-1VTa
(x − x )/ 2 (x + x ) / 2 z = V T x = 1 2 x = Vz = 1 2 + − + ( x1 x2)/ 2 ( x1 x2 )/ 2 0 1/3 z* = x* = 2/(3 2) 1/3
19 Comparison of Optimization Methods
1. Convergence Theory • Global Convergence - will it converge to a local optimum (or stationary point) from a poor starting point? • Local Convergence Rate - how fast will it converge close to the solution?
2. Benchmarks on Large Class of Test Problems Representative Problem (Hughes, 1981)
α β Min f(x1, x2) = exp(- ) u = x1 - 0.8 2 1/2 v = x2 - (a1 + a2 u (1- u) - a3 u) α 2 1/2 = -b1 + b2 u (1+u) + b3 u β = 2 2 c1 v (1 - c2 v)/(1+ c3 u )
a = [ 0.3, 0.6, 0.2] b = [5, 26, 3] c = [40, 1, 10] x* = [0.7395, 0.3144] f(x*) = 5.0893
20 Three Dimensional Surface and Curvature for Representative Test Problem
Regions where minimum eigenvalue is less than: [0, -10, -50, -100, -150, -200]
21 What conditions characterize an optimal solution?
x2 Unconstrained Local Minimum Necessary Conditions ∇f (x*) = 0 pT∇2f (x*) p 0 for p∈ℜn (positive semi-definite)
x* Unconstrained Local Minimum Sufficient Conditions ∇f (x*) = 0 T∇2 ∈ℜn Contours of f(x) p f (x*) p > 0 for p (positive definite)
x1
For smooth functions, why are contours around optimum elliptical? Taylor Series in n dimensions about x*: 1 f (x) = f (x*) + ∇f (x*)T (x − x*) + (x − x*)T ∇2 f (x*)(x − x*) 2 Since ∇f(x*) = 0, f(x) is purely quadratic for x close to x* 22 Newton’s Method
Taylor Series for f(x) about xk
Take derivative wrt x, set LHS § 0
0 §∇f(x) = ∇f(xk) + ∇2f(xk) (x - xk) ⇒ (x - xk) ≡ d = - (∇2f(xk))-1 ∇f(xk)
• f(x) is convex (concave) if for all x ∈ℜn, ∇2f(x) is positive (negative) semidefinite λ ≥ λ i.e. minj j 0 (maxj j 0) • Method can fail if: - x0 far from optimum - ∇2f is singular at any point - f(x) is not smooth • Search direction, d, requires solution of linear equations. • Near solution: 2 xk+1 - x * = K xk - x *
23 Basic Newton Algorithm - Line Search
0. Guess x0, Evaluate f(x0).
1. At xk, evaluate ∇f(xk).
2. Evaluate Bk = ∇2f(xk) or an approximation.
3. Solve: Bk d = -f(xk) If convergence error is less than tolerance: e.g., ||∇f(xk) || ≤ ε and ||d|| ≤ ε STOP, else go to 4.
4. Find α so that 0 < α ≤ 1 and f(xk + α d) < f(xk) sufficiently (Each trial requires evaluation of f(x))
5. xk+1 = xk + α d. Set k = k + 1 Go to 1.
24 Newton’s Method - Convergence Path
Starting Points [0.8, 0.2] needs steepest descent steps w/ line search up to ’O’, takes 7 iterations to ||∇f(x*)|| 10-6
[0.35, 0.65] converges in four iterations with full steps to ||∇f(x*)|| 10-6
25 Newton’s Method - Notes
• Choice of Bk determines method. - Steepest Descent: Bk = γ I - Newton: Bk = ∇2f(x) • With suitable Bk, performance may be good enough if f(xk + αd) is sufficiently decreased (instead of minimized along line search direction). • Trust region extensions to Newton's method provide very strong global convergence properties and very reliable algorithms. • Local rate of convergence depends on choice of Bk. x k +1 − x * − = Newton Quadratic Rate : limk →∞ 2 K x k − x * x k +1 − x * Steepest descent − Linear Rate : lim →∞ <1 k x k − x * x k+1 − x * Desired?− Superlinear Rate : lim →∞ = 0 k x k − x *
26 Quasi-Newton Methods
Motivation: • Need Bk to be positive definite. • Avoid calculation of ∇2f. • Avoid solution of linear system for d = - (Bk)-1 ∇f(xk)
Strategy: Define matrix updating formulas that give (Bk) symmetric, positive definite and satisfy: (Bk+1)(xk+1 - xk) = (∇fk+1 - ∇fk) (Secant relation)
DFP Formula: (Davidon, Fletcher, Powell, 1958, 1964) T T (y - Bks)yT + y (y - Bks) (y - Bks) s y yT Bk+1 = Bk + - yT s (yT s)(yT s)
T k T k -1 k+1 ss H y y H ()Bk+1 = H = Hk + - sT y y Hk y
where: s = xk+1- xk y = ∇f (xk+1) - ∇f (xk)
27 Quasi-Newton Methods
BFGS Formula (Broyden, Fletcher, Goldfarb, Shanno, 1970-71)
T k T k + yy B s s B Bk 1 = Bk + - sT y s Bk s
k T k T k T T −1 (s - H y)s + s (s - H y) (y - H s) y s s ()Bk+1 =Hk+1 = Hk + - yT s ()yT s ()yT s
Notes: 1) Both formulas are derived under similar assumptions and have symmetry 2) Both have superlinear convergence and terminate in n steps on quadratic functions. They are identical if α is minimized. 3) BFGS is more stable and performs better than DFP, in general. 4) For n ≤ 100, these are the best methods for general purpose problems if second derivatives are not available.
28 Quasi-Newton Method - BFGS Convergence Path
Starting Point [0.8, 0.2] starting from B0 = I, converges in 9 iterations to ||∇f(x*)|| 10-6
29 Sources For Unconstrained Software
Harwell (HSL) IMSL NAg - Unconstrained Optimization Codes Netlib (www.netlib.org) •MINPACK •TOMS Algorithms, etc. These sources contain various methods •Quasi-Newton •Gauss-Newton •Sparse Newton •Conjugate Gradient
30 Constrained Optimization (Nonlinear Programming)
Problem: Minx f(x) s.t. g(x) ≤ 0 h(x) = 0 where: f(x) - scalar objective function x - n vector of variables g(x) - inequality constraints, m vector h(x) - meq equality constraints.
Sufficient Condition for Unique Optimum - f(x) must be convex, and - feasible region must be convex, i.e. g(x) are all convex h(x) are all linear Except in special cases, ther is no guarantee that a local optimum is global if sufficient conditions are violated.
31 Example: Minimize Packing Dimensions
y What is the smallest box for three round objects?
Variables: A, B, (x1, y1), (x2, y2), (x3, y3)
Fixed Parameters: R1, R2, R3 Objective: Minimize Perimeter = 2(A+B) Constraints: Circles remain in box, can’t overlap 2 Decisions: Sides of box, centers of circles. A 3 1
B x ≥ ≤ ≤ 2 2 2 x1, y R1 x1 B- R1, y A - R1 ()x1 - x2 + (y - y ) ≥ ()R1 + R2 1 1 1 2 ≥ ≤ ≤ 2 2 2 y2 R 2 x2 B- R 2, y2 A - R 2 () () ≥ () x 2, x1 - x3 + y1 - y3 R1 + R3 ≥ ≤ ≤ 2 x3, y3 R 3 x3 B- R 3, y3 A - R 3 ()2 ≥ ()2 x2 - x3 + ()y2 - y3 R2 + R3 in box ≥ no overlaps x1, x2, x3, y1, y2, y3, A, B 0
32 Characterization of Constrained Optima
Min
Min
Linear Program Linear Program (Alternate Optima)
Min
Min Min
Convex Objective Functions Linear Constraints Min
Min Min Min
Nonconvex Region Min Nonconvex Objective Multiple Optima 33 Multiple Optima What conditions characterize an optimal solution?
Unconstrained Local Minimum Unconstrained Local Minimum Necessary Conditions Sufficient Conditions ∇f (x*) = 0 ∇f (x*) = 0 pT∇2f (x*) p 0 for p∈ℜn pT∇2f (x*) p > 0 for p∈ℜn (positive semi-definite) (positive definite)
34 Optimal solution for inequality constrained problem
Min f(x) s.t. g(x) 0 Analogy: Ball rolling down valley pinned by fence Note: Balance of forces (∇f, ∇g1)
35 Optimal solution for general constrained problem
Problem: Min f(x) s.t. g(x) 0 h(x) = 0 Analogy: Ball rolling on rail pinned by fences Balance of forces: ∇f, ∇g1, ∇h
36 Optimality conditions for local optimum
Necessary First Order Karush Kuhn - Tucker Conditions
∇ L (x*, u, v) = ∇f(x*) + ∇g(x*) u + ∇h(x*) v = 0 (Balance of Forces) u 0 (Inequalities act in only one direction) g (x*) 0, h (x*) = 0 (Feasibility)
uj gj(x*) = 0 (Complementarity: either gj(x*) = 0 or uj = 0) u, v are "weights" for "forces," known as KKT multipliers, shadow prices, dual variables
“To guarantee that a local NLP solution satisfies KKT conditions, a constraint qualification is required. E.g., the Linear Independence Constraint Qualification ∇ ∇ (LICQ) requires active constraint gradients, [ gA(x*) h(x*)], to be linearly independent. Also, under LICQ, KKT multipliers are uniquely determined.”
Necessary (Sufficient) Second Order Conditions - Positive curvature in "constraint" directions. - pT∇ 2L (x*) p ≥ 0 (pT∇ 2L (x*) p > 0) ∇ T ∇ T where p are the constrained directions: gA(x*) p = 0, h(x*) p = 0 37 Single Variable Example of KKT Conditions
Min (x)2 s.t. -a x a, a > 0 x* = 0 is seen by inspection f(x)
Lagrange function : 2 L(x, u) = x + u1(x-a) + u2(-a-x)
First Order KKT conditions: ∇ L(x, u) = 2 x + u1 - u2 = 0 u1 (x-a) = 0 u2 (-a-x) = 0 -a x a u , u 0 x 1 2 -a a Consider three cases:
• u1 > 0, u2 = 0 Upper bound is active, x = a, u1 = -2a, u2 = 0
• u1 = 0, u2 > 0 Lower bound is active, x = -a, u2 = -2a, u1 = 0
• u1 = u2 = 0 Neither bound is active, u1 = 0, u2 = 0, x = 0
Second order conditions (x*, u1, u2 =0) ∇ xxL (x*, u*) = 2 T∇ ∆ 2 p xxL (x*, u*) p = 2 ( x) > 0 38 Single Variable Example of KKT Conditions - Revisited Min -(x)2 s.t. -a x a, a > 0 -a x a x* = ±a is seen by inspection
Lagrange function : 2 L(x, u) = -x + u1(x-a) + u2(-a-x)
First Order KKT conditions: ∇ L(x, u) = -2x + u1 - u2 = 0 u1 (x-a) = 0 u2 (-a-x) = 0 f(x) -a x a u1, u2 0 Consider three cases:
• u1 > 0, u2 = 0 Upper bound is active, x = a, u1 = 2a, u2 = 0
• u1 = 0, u2 > 0 Lower bound is active, x = -a, u2 = 2a, u1 = 0
• u1 = u2 = 0 Neither bound is active, u1 = 0, u2 = 0, x = 0
Second order conditions (x*, u1, u2 =0) ∇ xxL (x*, u*) = -2 T∇ ∆ 2 p xxL (x*, u*) p = -2( x) < 0 39 Interpretation of Second Order Conditions
For x = a or x = -a, we require the allowable direction to satisfy the active constraints exactly. Here, any point along the allowable direction, x* must remain at its bound.
For this problem, however, there are no nonzero allowable directions that satisfy this condition. Consequently the solution x* is defined entirely by the active constraint. The condition: T ∇ p xxL (x*, u*, v*) p > 0 for all allowable directions, is vacuously satisfied - because there are ∇ T no allowable directions that satisfy gA(x*) p = 0. Hence, sufficient second order conditions are satisfied.
As we will see, sufficient second order conditions are satisfied by linear programs as well.
40 Role of KKT Multipliers
-a x a a + ∆a
Also known as: • Shadow Prices • Dual Variables f(x) • Lagrange Multipliers Suppose a in the constraint is increased to a + ∆a f(x*) = (a + ∆a)2 and [f(x*, a + ∆a) - f(x*, a)]/∆a = 2a + ∆a
df(x*)/da = 2a = u1 41 Special Cases of Nonlinear Programming
Linear Programming: Min cTx x s.t. Ax b 2 Cx = d, x 0 Functions are all convex ⇒ global min. Because of Linearity, can prove solution will always lie at vertex of feasible region.
x1 Simplex Method - Start at vertex - Move to adjacent vertex that offers most improvement - Continue until no further improvement Notes: 1) LP has wide uses in planning, blending and scheduling 2) Canned programs widely available.
42 Linear Programming Example
Simplex Method
Min -2x1 - 3x2 Min -2x1 - 3x2 ⇒ s.t. 2x1 + x2 5 s.t. 2x1 + x2 + x3 = 5 x1, x2 0 x1, x2, x3 0 (add slack variable) ⇒ Now, define f = -2x1 - 3x2 f + 2x1 + 3x2 = 0 Set x1, x2 = 0, x3 = 5 and form tableau x1 x2 x3 f b x1, x2 nonbasic 2 1 1 0 5 x3 basic 2 3 0 1 0
To decrease f, increase x2. How much? so x3 0 x1 x2 x3 f b 2 1 1 0 5 -4 0 -3 1 -15 f can no longer be decreased! Optimal
Underlined terms are -(reduced gradients); nonbasic variables (x1, x3), basic variable x2
43 Quadratic Programming
Problem: Min aTx + 1/2 xT B x A x b C x = d 1) Can be solved using LP-like techniques: (Wolfe, 1959) ⇒ Min Σj (zj+ + zj-) s.t. a + Bx + ATu + CTv = z+ - z- Ax - b + s = 0 Cx - d = 0 s, z+, z- 0
{uj sj = 0} with complicating conditions.
2) If B is positive definite, QP solution is unique. If B is pos. semidefinite, optimum value is unique.
3) Other methods for solving QP’s (faster) - Complementary Pivoting (Lemke)
- Range, Null Space methods (Gill, Murray). 44 Portfolio Planning Problem
Definitions:
xi - fraction or amount invested in security i ri (t) - (1 + rate of return) for investment i in year t. µ i - average r(t) over T years, i.e.
T µ 1 i = ∑r i (t) T t=1 µ Max ∑ i xi i = s.t. ∑ xi 1 i ≥ xi ,0 etc. Note: maximize average return, no accounting for risk.
45 Portfolio Planning Problem Definition of Risk - fluctuation of ri(t) over investment (or past) time period. To minimize risk, minimize variance about portfolio mean (risk averse).
Variance/Covariance Matrix, S 1 T {} σ 2 ()µ ( µ ) S ij = ij = ∑ r i (t) - i r j (t) - j T t =1
Max x T Sx = s .t. ∑ x i 1 i µ ≥ ∑ i x i R i ≥ x i 0 , etc .
Example: 3 investments µ j 3 1 - 0.5 1. IBM 1.3 S = 1 2 0.4 2. GM 1.2 -0.5 0.4 1 3. Gold 1.08
46 Portfolio Planning Problem - GAMS
SIMPLE PORTFOLIO INVESTMENT PROBLEM (MARKOWITZ) 4 5 OPTION LIMROW=0; 6 OPTION LIMXOL=0; 7 8 VARIABLES IBM, GM, GOLD, OBJQP, OBJLP; 9 10 EQUATIONS E1,E2,QP,LP; 11 12 LP.. OBJLP =E= 1.3*IBM + 1.2*GM + 1.08*GOLD; 13 14 QP.. OBJQP =E= 3*IBM**2 + 2*IBM*GM - IBM*GOLD 15 + 2*GM**2 - 0.8*GM*GOLD + GOLD**2; 16 17 E1..1.3*IBM + 1.2*GM + 1.08*GOLD =G= 1.15; 18 19 E2.. IBM + GM + GOLD =E= 1; 20 21 IBM.LO = 0.; 22 IBM.UP = 0.75; 23 GM.LO = 0.; 24 GM.UP = 0.75; 25 GOLD.LO = 0.; 26 GOLD.UP = 0.75; 27 28 MODEL PORTQP/QP,E1,E2/; 29 30 MODEL PORTLP/LP,E2/; 31 32 SOLVE PORTLP USING LP MAXIMIZING OBJLP; 33 34 SOLVE PORTQP USING NLP MINIMIZING OBJQP; 47 Portfolio Planning Problem - GAMS
S O L VE S U M M A R Y **** MODEL STATUS 1 OPTIMAL **** OBJECTIVE VALUE 1.2750 RESOURCE USAGE, LIMIT 1.270 1000.000 ITERATION COUNT, LIMIT 1 1000 BDM - LP VERSION 1.01 A. Brooke, A. Drud, and A. Meeraus, Analytic Support Unit, Development Research Department, World Bank, Washington D.C. 20433, U.S.A.
Estimate work space needed -- 33 Kb Work space allocated -- 231 Kb EXIT - - OPTIMAL SOLUTION FOUND. LOWER LEVEL UPPER MARGINAL ----EQU LP . . . 1.000 ----EQU E2 1.000 1.000 1.000 1.200
LOWER LEVEL UPPER MARGINAL ----VAR IBM 0.750 0.750 0.100 ----VAR GM . 0.250 0.750 . ----VAR GOLD. .. 0.750 -0.120 ----VAR OBJLP -INF 1.275 +INF . **** REPORT SUMMARY : 0 NONOPT 0 INFEASIBLE 0 UNBOUNDED SIMPLE PORTFOLIO INVESTMENT PROBLEM (MARKOWITZ) Model Statistics SOLVE PORTQP USING NLP FROM LINE 34 MODEL STATISTICS BLOCKS OF EQUATIONS 3 SINGLE EQUATIONS 3 BLOCKS OF VARIABLES 4 SINGLE VARIABLES 4 NON ZERO ELEMENTS 10 NON LINEAR N-Z 3 DERIVITIVE POOL 8 CONSTANT POOL 3 CODE LENGTH 95 GENERATION TIME = 2.360 SECONDS EXECUTION TIME = 3.510 SECONDS
48 Portfolio Planning Problem - GAMS
S O L VE S U M M A R Y MODEL PORTLP OBJECTIVE OBJLP TYPE LP DIRECTION MAXIMIZE SOLVER MINOS5 FROM LINE 34 **** SOLVER STATUS 1 NORMAL COMPLETION **** MODEL STATUS 2 LOCALLY OPTIMAL **** OBJECTIVE VALUE 0.4210 RESOURCE USAGE, LIMIT 3.129 1000.000 ITERATION COUNT, LIMIT 3 1000 EVALUATION ERRORS 0 0 M I N O S 5.3 (Nov. 1990) Ver: 225-DOS-02 B.A. Murtagh, University of New South Wales and P.E. Gill, W. Murray, M.A. Saunders and M.H. Wright Systems Optimization Laboratory, Stanford University.
EXIT - - OPTIMAL SOLUTION FOUND MAJOR ITNS, LIMIT 1 FUNOBJ, FUNCON CALLS 8 SUPERBASICS 1 INTERPRETER USAGE .21 NORM RG / NORM PI 3.732E-17 LOWER LEVEL UPPER MARGINAL ----EQU QP . . . 1.000 ----EQU E1 1.150 1.150 +INF 1.216 ----EQU E2 1.000 1.000 1.000 -0.556 LOWER LEVEL UPPER MARGINAL ----VAR IBM . 0.183 0.750 . ----VAR GM . 0.248 0.750 EPS ----VAR GOLD . 0.569 0.750 . ----VAR OBJLP -INF 1.421 +INF . **** REPORT SUMMARY : 0 NONOPT 0 INFEASIBLE 0 UNBOUNDED 0 ERRORS SIMPLE PORTFOLIO INVESTMENT PROBLEM (MARKOWITZ) Model Statistics SOLVE PORTQP USING NLP FROM LINE 34 EXECUTION TIME = 1.090 SECONDS 49 Algorithms for Constrained Problems
Motivation: Build on unconstrained methods wherever possible.
Classification of Methods:
•Reduced Gradient Methods - (with Restoration) GRG2, CONOPT •Reduced Gradient Methods - (without Restoration) MINOS •Successive Quadratic Programming - generic implementations •Penalty Functions - popular in 1970s, but fell into disfavor. Barrier Methods have been developed recently and are again popular. •Successive Linear Programming - only useful for "mostly linear" problems
We will concentrate on algorithms for first four classes.
Evaluation: Compare performance on "typical problem," cite experience on process problems.
50 Representative Constrained Problem (Hughes, 1981)
g1 Š 0
g2 Š 0
α β Min f(x1, x2) = exp(- ) 2 2 g1 = (x2+0.1) [x1 +2(1-x2)(1-2x2)] - 0.16 0 2 2 g2 = (x1 - 0.3) + (x2 - 0.3) - 0.16 0 x* = [0.6335, 0.3465] f(x*) = -4.8380 51 Reduced Gradient Method with Restoration (GRG2/CONOPT)
Min f(x) Min f(z) s.t. g(x) + s = 0 (add slack variable) ‘⇒ s.t. h(z) = 0 c(x) = 0 a z b a x b, s 0
• Partition variables into:
zB - dependent or basic variables
zN - nonbasic variables, fixed at a bound
zS - independent or superbasic variables Analogy to linear programming. Superbasics required only if nonlinear problem • Solve unconstrained problem in space of superbasic variables. T T T T Let z = [zS zB zN ] optimize wrt zS with h(zS, zB , zN) = 0
Calculate constrained derivative or reduced gradient wrt zS. ∈ m •Dependent variables are zB R
•Nonbasic variables zN (temporarily) fixed 52 Definition of Reduced Gradient
∂ ∂ df = f + dzB f ∂ ∂ dzS zS dzS zB Because h(z) = ,0 we have: T T ∂h ∂h dh = dz + dz = 0 ∂ S ∂ B zS zB −1 dz ∂h ∂h − B = − = −∇ h[]∇ h 1 ∂ ∂ zS zB dzS zS zB This leads to :
df ∂f − ∂f = − ∇ h[]∇ h 1 ∂ zS zB ∂ dzS zS zB •By remaining feasible always, h(z) = 0, a z b, one can apply an
unconstrained algorithm (quasi-Newton) using (df/dzS)
•Solve problem in reduced space of zS variables 53 Example of Reduced Gradient
2 − Min x1 2 x2 + = ts .. 3x1 4 x2 24 ∇ T = ∇ T = h 3[ 4], f 2[ x1 - 2]
= = Let z S x1, z B x2 df ∂f − ∂f = − ∇ h[]∇ h 1 ∂ z S z B ∂ dz S z S z B df = − []−1 () = + 2 x1 3 4 - 2 2 x1 /3 2 dx1
∇ T ∇ T ∇ T If h is (m x n); zSh is m x (n-m); zBh is (m x m)
(df/dzS) is the change in f along constraint direction per unit change in zS
54 Sketch of GRG Algorithm 1.1. InitializeInitialize problemproblem andand obtainobtain aa feasiblefeasible pointpoint atat zz0 k 2.2. AtAt feasiblefeasible pointpoint zz ,, partitionpartition variablesvariables zz intointo zzN,, zzB,, zzS
3.3. CalculateCalculate reducedreduced gradient,gradient, ((df/dzdf/dzS)) -1 4.4. EvaluateEvaluate searchsearch directiondirection forfor zzS,, dd == BB (df/dz(df/dzS)) 5.5. PerformPerform aa lineline search.search. α∈α k αα •• FindFind (0,1] with zzS :=:= zzS ++ dd k αα •• SolveSolve forfor h(zh(zS ++ d,d, zzB,, zzN)) == 00 k αα k •• IfIf f(zf(zS ++ dd,, zzB,, zzN)) << f(zf(zS ,, zzB,, zzN),), k+1 k αα setset zzS ==zzS ++ d,d, k:=k:= k+1k+1 ε,ε, 6.6. IfIf ||||((df/dzdf/dzS)||<)||< Stop.Stop. Else,Else, gogo toto 2.2.
55 GRG Algorithm Properties
1. GRG2 has been implemented on PC’s as GINO and is very reliable and robust. It is also the optimization solver in MS EXCEL. 2. CONOPT is implemented in GAMS, AIMMS and AMPL 3. GRG2 uses Q-N for small problems but can switch to conjugate gradients if problem gets large. CONOPT uses exact second derivatives. ∇ 4. Convergence of h(zS, zB , zN) = 0 can get very expensive because h is required 5. Safeguards can be added so that restoration (step 5.) can be dropped and efficiency increases.
Representative Constrained Problem Starting Point [0.8, 0.2] • GINO Results - 14 iterations to ||∇f(x*)|| 10-6 • CONOPT Results - 7 iterations to ||∇f(x*)|| 10-6 from feasible point.
56 Reduced Gradient Method without Restoration (MINOS/Augmented)
Motivation: Efficient algorithms Strategy: (Robinson, Murtagh & Saunders) are available that solve linearly 1. Partition variables into basic, nonbasic constrained optimization variables and superbasic variables.. problems (MINOS): 2. Linearize active constraints at zk Dkz = ck Min f(x) 3. Let ψ = f (z) + vTh (z) + β (hTh) s.t. Ax b (Augmented Lagrange), Cx = d 4. Solve linearly constrained problem: Min ψ (z) Extend to nonlinear problems, s.t. Dz = c through successive linearization a z b using reduced gradients to get zk+1 Develop major iterations 5. Set k=k+1, go to 3. (linearizations) and minor 6. Algorithm terminates when no iterations (GRG solutions) . movement between steps 3) and 4).
57 MINOS/Augmented Notes
1. MINOS has been implemented very efficiently to take care of linearity. It becomes LP Simplex method if problem is totally linear. Also, very efficient matrix routines. 2. No restoration takes place, nonlinear constraints are reflected in ψ(z) during step 3). MINOS is more efficient than GRG. 3. Major iterations (steps 3) - 4)) converge at a quadratic rate. 4. Reduced gradient methods are complicated, monolithic codes: hard to integrate efficiently into modeling software.
Representative Constrained Problem – Starting Point [0.8, 0.2] MINOS Results: 4 major iterations, 11 function calls to ||∇f(x*)|| 10-6
58 Successive Quadratic Programming (SQP)
Motivation: • Take KKT conditions, expand in Taylor series about current point. • Take Newton step (QP) to determine next point.
Derivation – KKT Conditions ∇ ∇ ∇ ∇ xL (x*, u*, v*) = f(x*) + gA(x*) u* + h(x*) v* = 0 h(x*) = 0
gA(x*) = 0, where gA are the active constraints.
Newton - Step ∇ ∇ ∇ ∆ ∇ ( k k k) xx L g h x x L x , u , v A ∇ T ∆ ()k g 0 0 u = - gA x A T ∇ h 0 0 ∆v h()x k Requirements: ∇ • xxL must be calculated and should be ‘regular’ •correct active set gA •good estimates of uk, vk 59 SQP Chronology
1. Wilson (1963) - active set can be determined by solving QP: ∇ T T ∇ Min f(xk) d + 1/2 d xx L(xk, uk, vk) d d ∇ T s.t. g(xk) + g(xk) d 0 ∇ T h(xk) + h(xk) d = 0
2. Han (1976), (1977), Powell (1977), (1978) ∇ - approximate xxL using a positive definite quasi-Newton update (BFGS) - use a line search to converge from poor starting points.
Notes: - Similar methods were derived using penalty (not Lagrange) functions. - Method converges quickly; very few function evaluations. - Not well suited to large problems (full space update used). For n > 100, say, use reduced space methods (e.g. MINOS).
60 Elements of SQP – Hessian Approximation
What about ∇xxL? • need to get second derivatives for f(x), g(x), h(x). k k ∇ • need to estimate multipliers, u , v ; xxL may not be positive semidefinite ⇒ ∇ k k k k Approximate xxL (x , u , v ) by B , a symmetric positive definite matrix. T k T k + yy B s s B Bk 1 = Bk + - sT y s Bk s BFGS Formula s = xk+1 - xk y = ∇L(xk+1, uk+1, vk+1) - ∇L(xk, uk+1, vk+1) • second derivatives approximated by change in gradients • positive definite Bk ensures unique QP solution
61 Elements of SQP – Search Directions How do we obtain search directions? • Form QP and let QP determine constraint activity • At each iteration, k, solve: Min ∇f(xk) Td + 1/2 dT Bkd d s.t. g(xk) + ∇g(xk) T d 0 h(xk) + ∇h(xk) T d = 0
Convergence from poor starting points • As with Newton's method, choose α (stepsize) to ensure progress toward optimum: xk+1 = xk + α d. • α is chosen by making sure a merit function is decreased at each iteration. Exact Penalty Function ψ µ Σ Σ (x) = f(x) + [ max (0, gj(x)) + |hj (x)|] µ > maxj {| uj |, | vj |} Augmented Lagrange Function ψ(x) = f(x) + uTg(x) + vTh(x) η Σ 2 Σ 2 + /2 { (hj (x)) + max (0, gj (x)) } 62 Newton-Like Properties for SQP
Fast Local Convergence B = ∇xxL Quadratic ∇xxL is p.d and B is Q-N 1 step Superlinear B is Q-N update, ∇xxL not p.d 2 step Superlinear
Enforce Global Convergence Ensure decrease of merit function by taking α 1 Trust region adaptations provide a stronger guarantee of global convergence - but harder to implement.
63 Basic SQP Algorithm
0. Guess x0, Set B0 = I (Identity). Evaluate f(x0), g(x0) and h(x0). 1. At xk, evaluate ∇f(xk), ∇g(xk), ∇h(xk). 2. If k > 0, update Bk using the BFGS Formula. ∇ k T T k 3. Solve: Mind f(x ) d + 1/2 d B d s.t. g(xk) + ∇g(xk)Td ≤ 0 h(xk) + ∇h(xk)Td = 0 If KKT error less than tolerance: ||∇L(x*)|| ≤ ε, ||h(x*)|| ≤ ε, || ≤ ε g(x*)+|| . STOP, else go to 4. 4. Find α so that 0 < α ≤ 1 and ψ(xk + αd) < ψ(xk) sufficiently (Each trial requires evaluation of f(x), g(x) and h(x)). 5. xk+1 = xk + α d. Set k = k + 1 Go to 2.
64 Problems with SQP
Nonsmooth Functions - Reformulate Ill-conditioning - Proper scaling Poor Starting Points – Trust Regions can help Inconsistent Constraint Linearizations - Can lead to infeasible QP's x 2
Min x2 2 s.t. 1 + x1 - (x2) 0 2 x 1 1 - x1 - (x2) 0 x2 -1/2
65 SQP Test Problem
1.2
1.0
0.8 x2 0.6
x* 0.4
0.2
0.0 0.0 0.2 0.4 x1 0.6 0.8 1.0 1.2
Min x2 2 3 s.t. -x2 + 2 x1 - x1 0 2 3 -x2 + 2 (1-x1) - (1-x1) 0 x* = [0.5, 0.375].
66 SQP Test Problem – First Iteration 1.2
1.0
0.8 x2 0.6
0.4
0.2
0.0 0.0 0.2 0.4 x1 0.6 0.8 1.0 1.2 T 0 Start from the origin (x0 = [0, 0] ) with B = I, form:
2 2 Min d2 + 1/2 (d1 + d2 ) s.t. d2 0 d1 + d2 1 T µ µ d = [1, 0] . with 1 = 0 and 2 = 1. 67 SQP Test Problem – Second Iteration 1.2
1.0
0.8 x2 0.6
0.4 x*
0.2
0.0 0.0 0.2 0.4 x1 0.6 0.8 1.0 1.2 T 1 From x1 = [0.5, 0] with B = I (no update from BFGS possible), form:
2 2 Min d2 + 1/2 (d1 + d2 ) s.t. -1.25 d1 - d2 + 0.375 0 1.25 d1 - d2 + 0.375 0 T µ µ d = [0, 0.375] with 1 = 0.5 and 2 = 0.5 T x* = [0.5, 0.375] is optimal 68 Representative Constrained Problem SQP Convergence Path
g1 Š 0
g2 Š 0
Starting Point[0.8, 0.2] - starting from B0 = I and staying in bounds and linearized constraints; converges in 8 iterations to ||∇f(x*)|| 10-6
69 Barrier Methods for Large-Scale Nonlinear Programming min f (x) x∈ℜn Can generalize for Original Formulation s.t c(x) = 0 a ≤ x ≤ b x ≥ 0
n ϕ = − µ Barrier Approach min µ (x) f (x) ∑ln xi x∈ℜn i=1 s.t c(x) = 0
⇒As µ Î 0, x*(µ) Î x* Fiacco and McCormick (1968)
70 Solution of the Barrier Problem
⇒Newton Directions (KKT System) ∇ f ( x ) + A( x )λ − v = 0 XVe − µ e = 0 c ( x ) = 0
⇒ ⇒ SolveReducing the System =µ −1 − − −1 dv X e v X Vdx W A − I d ∇f + Aλ −v +Σ x ∇ϕ W A d=x − µ − A 0 0 dλ = − c Σ= 1 T + X V λ −1 V A0 X 0dν v− µc S e
IPOPT Code – www.coin-or.org 71 Global Convergence of Newton-based Barrier Solvers Merit Function Exact Penalty: P(x, η) = f(x) + η ||c(x)|| Aug’d Lagrangian: L*(x, λ, η) = f(x) + λTc(x) + η ||c(x)||2 Assess Search Direction (e.g., from IPOPT) Line Search – choose stepsize α to give sufficient decrease of merit function using a ‘step to the boundary’ rule with τ ~0.99. α ∈ α = +α for ,0( ], xk +1 xk d x +α ≥ −τ > xk d x 1( )xk 0 = +α ≥ −τ > vk +1 vk dv 1( )vk 0 λ = λ +α λ − λ k+1 k ( + k )
• How do we balance φ (x) and c(x) with η? • Is this approach globally convergent? Will it still be fast?
72 Global Convergence Failure (Wächter and B., 2000) Min f (x) − − 1 = ts .. x1 x3 0 x2 2 2 − − = (x1) x2 1 0 ≥ x2 , x3 0 Newton-type line search ‘stalls’ even though descent directions exist x 1 k T + k = A(x ) d x c(x ) 0 k +α > x d x 0 Remedies: •Composite Step Trust Region (Byrd et al.)
•Filter Line Search Methods 73 Line Search Filter Method
Store (φk, θk) at allowed iterates Allow progress if trial point is θ acceptable to filter with margin φ(x) If switching condition α −∇φ T a ≥ δ θ b > > [ k d] [ k ,] a 2b 2 is satisfied, only an Armijo line search is required on φk If insufficient progress on stepsize, evoke restoration phase to reduce θ. Global convergence and superlinear local convergence proved (with second order correction)
θ(x) = ||c(x)||
74 Implementation Details
Modify KKT (full space) matrix if nonsingular W + Σ +δ A k k 1 k T −δ Ak 2 I δ 1 - Correct inertia to guarantee descent direction δ 2 - Deal with rank deficient Ak KKT matrix factored by MA27 Feasibility restoration phase + − 2 Min || c(x ||) 1 || x xk ||Q ≤ ≤ xl xk xu
Apply Exact Penalty Formulation Exploit same structure/algorithm to reduce infeasibility
75 Details of IPOPT Algorithm
Options Comparison
Line Search Strategies l 34 COPS Problems - 2 exact penalty merit function (600 - 160 000 variables) - augmented Lagrangian function 486 CUTE Problems - Filter method (adapted from (2 – 50 000 variables) Fletcher and Leyffer) 56 MIT T Problems (12097 – 99998 variables) Hessian Calculation - BFGS (reduced space) Performance Measure - SR1 (reduced space) - rp, l = (#iterp,l)/ (#iterp, min) - Exact full Hessian (direct) - P(τ) = fraction of problems τ - Exact reduced Hessian (direct) with log2(rp, l) < - Preconditioned CG
76 IPOPT Comparison with KNITRO and LOQO
CPU time (COPS) CPU time (CUTE) IPOPT IPOPT 1 1 KNITRO KNITRO LOQO LOQO 0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6 rho rho 0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 tau tau CPU time (MITT) •IPOPT has better performance, 1 robustness on CUTE, MITT and 0.9 COPS problem sets 0.8
0.7 IPOPT KNITRO •Similar results appear with iteration 0.6 LOQO counts rho 0.5
0.4 •Can be downloaded from
0.3 http://www.coin-or.org 0.2 •See links for additional information 0.1
0 0 1 2 3 4 5 6 7 8 77 tau Recommendations for Constrained Optimization
1. Best current algorithms • GRG 2/CONOPT • MINOS • SQP • IPOPT 2. GRG 2 (or CONOPT) is generally slower, but is robust. Use with highly nonlinear functions. Solver in Excel! 3. For small problems (n 100) with nonlinear constraints, use SQP. 4. For large problems (n 100) with mostly linear constraints, use MINOS. ==> Difficulty with many nonlinearities
Fewer Function SQP MINOS Tailored Linear Evaluations CONOPT Algebra
Small, Nonlinear Problems - SQP solves QP's, not LCNLP's, fewer function calls. Large, Mostly Linear Problems - MINOS performs sparse constraint decomposition. Works efficiently in reduced space if function calls are cheap! Exploit Both Features – IPOPT takes advantages of few function evaluations and large- scale linear algebra, but requires exact second derivatives
78 Available Software for Constrained Optimization
SQP Routines HSL, NaG and IMSL (NLPQL) Routines NPSOL – Stanford Systems Optimization Lab SNOPT – Stanford Systems Optimization Lab (rSQP discussed later) IPOPT – http://www.coin-or.org
GAMS Programs CONOPT - Generalized Reduced Gradient method with restoration MINOS - Generalized Reduced Gradient method without restoration A student version of GAMS is now available from the CACHE office. The cost for this package including Process Design Case Students, GAMS: A User’s Guide, and GAMS - The Solver Manuals, and a CD-ROM is $65 per CACHE supporting departments, and $100 per non-CACHE supporting departments and individuals. To order please complete standard order form and fax or mail to CACHE Corporation. More information can be found on http://www.che.utexas.edu/cache/gams.html
MS Excel Solver uses Generalized Reduced Gradient method with restoration
79 Rules for Formulating Nonlinear Programs
1) Avoid overflows and undefined terms, (do not divide, take logs, etc.) e.g. x + y - ln z = 0 Î x + y - u = 0 exp u - z = 0 2) If constraints must always be enforced, make sure they are linear or bounds. e.g. v(xy - z2)1/2 = 3 Î vu = 3 u- (xy - z2) = 0, u 0 3) Exploit linear constraints as much as possible, e.g. mass balance Î xi L + yi V = F zi li + vi = fi ∑ L – li = 0 4) Use bounds and constraints to enforce characteristic solutions. e.g. a x b, g (x) 0 to isolate correct root of h (x) = 0. 5) Exploit global properties when possibility exists. Convex (linear equations?) Linear Program? Quadratic Program? Geometric Program? 6) Exploit problem structure when possible. e.g. Min [Tx-3Ty] s.t. xT + y - T2 y = 5 4x - 5Ty + Tx = 7 0 T 1 (If T is fixed ⇒ solve LP) ⇒ put T in outer optimization loop.
80 Process Optimization Problem Definition and Formulation
State of Nature and Problem Premises
Restrictions: Physical, Legal Desired Objective: Yield, Economic, Political, etc. Economic, Capacity, etc.
Decisions
Mathematical Modeling and Algorithmic Solution
Process Model Equations
Constraints Objective Function
Additional Variables
81 Large Scale NLP Algorithms Motivation: Improvement of Successive Quadratic Programming as Cornerstone Algorithm Î process optimization for design, control and operations Evolution of NLP Solvers: SQP rSQP IPOPT
rSQP++
1999-1988-98:1981-87: : Simultaneous Static Flowsheet Real-time dynamic optimization optimization optimization overoverover 1 000100 100 000000 variables variablesvariables and andand constraints constraintsconstraints
Current: Tailor structure, architecture and problems 82 Flowsheet Optimization Problems - Introduction
Modular Simulation Mode Physical Relation to Process - Intuitive to Process Engineer - Unit equations solved internally Out In - tailor-made procedures.
•Convergence Procedures - for simple flowsheets, often identified from flowsheet structure •Convergence "mimics" startup. •Debugging flowsheets on "physical" grounds
83 Flowsheet Optimization Problems - Features
Design Specifications Specify # trays reflux ratio, but would like to specify C overhead comp. ==> Control loop -Solve Iteratively
1 3
Nested Recycles Hard to Handle 2 4 Best Convergence Procedure?
•Frequent block evaluation can be expensive •Slow algorithms applied to flowsheet loops. •NLP methods are good at breaking looks
84 Chronology in Process Optimization
Sim. Time Equiv. 1. Early Work - Black Box Approaches Friedman and Pinder (1972) 75-150 Gaddy and co-workers (1977) 300 2. Transition - more accurate gradients Parker and Hughes (1981) 64 Biegler and Hughes (1981) 13 3. Infeasible Path Strategy for Modular Simulators Biegler and Hughes (1982) <10 Chen and Stadtherr (1985) Kaijaluoto et al. (1985) and many more 4. Equation Based Process Optimization Westerberg et al. (1983) <5 Shewchuk (1985) 2 DMO, NOVA, RTOPT, etc. (1990s) 1-2
Process optimization should be as cheap and easy as process simulation
85 Process Simulators with Optimization Capabilities (using SQP)
Aspen Custom Modeler (ACM) Aspen/Plus gProms Hysim/Hysys Massbal Optisim Pro/II ProSim ROMeo VTPLAN
86 Simulation and Optimization of Flowsheets
3 2
h(y) = 0 4 6 1 w(y) y
5 Min f(x), s.t. g(x) 0 For single degree of freedom: • work in space defined by curve below. • requires repeated (expensive) recycle convergence
f(x, y(x))
x 87 Expanded Region with Feasible Path
88 "Black Box" Optimization Approach • Vertical steps are expensive (flowsheet convergence) • Generally no connection between x and y. • Can have "noisy" derivatives for gradient optimization.
89 SQP - Infeasible Path Approach • solve and optimize simultaneously in x and y • extended Newton method
90 Optimization Capability for Modular Simulators (FLOWTRAN, Aspen/Plus, Pro/II, HySys)
Architecture - Replace convergence with optimization block - Problem definition needed (in-line FORTRAN) - Executive, preprocessor, modules intact.
Examples 1. Single Unit and Acyclic Optimization - Distillation columns & sequences
2. "Conventional" Process Optimization - Monochlorobenzene process - NH3 synthesis
3. Complicated Recycles & Control Loops - Cavett problem - Variations of above
91 Optimization of Monochlorobenzene Process
S06 U = 100 Cooling HC1 Water 80o F
A-1 ABSORBER P PHYSICAL PROPERTY OPTIONS 15 Trays S11 (3 Theoretical Stages) Cavett Vapor Pressure 32 psia Redlich-Kwong Vapor Fugacity Benzene, S05 25 0.1 Lb Mole/Hr Corrected Liquid Fugacity S04 psia of MCB Feed o T Ideal Solution Activity Coefficient 80 F S10 D-1 o S07 37 psia 270 F DISTILLATION OPT (SCOPT) OPTIMIZER 30 Trays HC1 Optimal Solution Found After 4 Iterations F-1 (20 Theoretical Stages) S01 S02 FLASH Kuhn-Tucker Error 0.29616E-05 S09 Steam o Steam Allowable Kuhn-Tucker Error 0.19826E-04 360 F S03 S08 360oF Objective Function -0.98259 H-1 12,000 2 U = 100 T-1 Btu/hr- ft S12 TREATER o Feed Flow Rates 90 F Maximize H-2 Optimization Variables LB Moles/Hr Profit U = 100 HC1 10 32.006 0.38578 200.00 120.00 Benzene 40 S13 Water o S14 Tear Variables MCB 50 C S15 80 F 0.10601E-19 13.064 79.229 120.00 50.000 P-1 1200 F MCB Tear Variable Errors (Calculated Minus Assumed) T -0.10601E-19 0.72209E-06 -0.36563E-04 0.00000E+00 0.00000E+00 -Results of infeasible path optimization -Simultaneous optimization and convergence of tear streams.
92 Ammonia Process Optimization
Pr H2 Tr N2 Hydrogen Feed Nitrogen Feed N2 5.2% 99.8% H2 94.0% --- To CH4 0.79 % 0.02% TTf f Ar --- 0.01% ν
Product Hydrogen and Nitrogen feed are mixed, compressed, and combined with a recycle stream and heated to reactor temperature. Reaction occurs in a multibed reactor (modeled here as an equilibrium reactor) to partially convert the stream to ammonia. The reactor effluent is cooled and product is separated using two flash tanks with intercooling. Liquid from the second stage is flashed at low pressure to yield high purity
NH3 product. Vapor from the two stage flash forms the recycle and is recompressed. 93 Ammonia Process Optimization
Optimization Problem Performance Characterstics
Max {Total Profit @ 15% over five years} • 5 SQP iterations. • 2.2 base point simulations. s.t.• 105 tons NH3/yr. • objective function improves by • Pressure Balance 6 6 • No Liquid in Compressors $20.66 x 10 to $24.93 x 10 . • 1.8 H2/N2 3.5 • difficult to converge flowsheet • Treact 1000o F at starting point • NH3 purged 4.5 lb mol/hr • NH3 Product Purity 99.9 % • Tear Equations Optimum Starting point
Objective Function($106) 24.9286 20.659 1. Inlet temp. reactor (oF) 400 400 2. Inlet temp. 1st flash (oF) 65 65 3. Inlet temp. 2nd flash (oF) 35 35 4. Inlet temp. rec. comp. (oF) 80.52 107 5. Purge fraction (%) 0.0085 0.01 6. Reactor Press. (psia) 2163.5 2000 7. Feed 1 (lb mol/hr) 2629.7 2632.0 8. Feed 2 (lb mol/hr) 691.78 691.4 94 How accurate should gradients be for optimization?
Recognizing True Solution • KKT conditions and Reduced Gradients determine true solution • Derivative Errors will lead to wrong solutions!
Performance of Algorithms Constrained NLP algorithms are gradient based (SQP, Conopt, GRG2, MINOS, etc.) Global and Superlinear convergence theory assumes accurate gradients
Worst Case Example (Carter, 1991) Newton’s Method generates an ascent direction and fails for any ε ! Min f (x) = xT Ax ε + /1 ε ε − /1 ε A = ε − /1 ε ε + /1 ε T = ∇ = ε − x0 1[ ]1 f (x0 ) x0 = ∇ + ε g(x0 ) f (x0 ) O( ) = − −1 d A g(x0 ) 95 Implementation of Analytic Derivatives
parameters, p exit variables, s
x Module Equations y
c(v, x, s, p, y) = 0
dy/dx ds/dx Sensitivity dy/dp ds/dp Equations
Automatic Differentiation Tools
JAKE-F, limited to a subset of FORTRAN (Hillstrom, 1982) DAPRE, which has been developed for use with the NAG library (Pryce, Davis, 1987) ADOL-C, implemented using operator overloading features of C++ (Griewank, 1990) ADIFOR, (Bischof et al, 1992) uses source transformation approach FORTRAN code . TAPENADE, web-based source transformation for FORTRAN code
Relative effort needed to calculate gradients is not n+1 but about 3 to 5 (Wolfe, Griewank) 96 Flash Recycle Optimization Ammonia Process Optimization (2 decisions + 7 tear variables) (9 decisions and 6 tear variables) Reactor S3
S1 S2 Flash Mixer P
S6 Hi P S4 S5 Flash S7 Ratio Lo P 2 2 3 1/2 Max S3(A) *S3(B) - S3(A) - S3(C) + S3(D) - (S3(E)) Flash
8000 200
GRG SQP GRG 6000 rSQP SQP rSQP
100 4000 CPU SecondsCPU (VS 3200) CPU SecondsCPU 3200) (VS 2000
0 0 1 2 Numerical Exact Numerical1 Exact 2 97 Large-Scale SQP Min f(z) Min ∇f(zk)T d + 1/2 d T Wk d s.t. c(z)=0 s.t. c(zk) + (Αk)T d = 0 k zL z zU zL z + d zU
Characteristics • Many equations and variables ( 100 000) • Many bounds and inequalities ( 100 000)
Few degrees of freedom (10 - 100) Steady state flowsheet optimization Real-time optimization Parameter estimation Many degrees of freedom ( 1000) Dynamic optimization (optimal control, MPC) State estimation and data reconciliation
98 Few degrees of freedom => reduced space SQP (rSQP)
• Take advantage of sparsity of A=∇c(x) • project W into space of active (or equality constraints) • curvature (second derivative) information only needed in space of degrees of freedom • second derivatives can be applied or approximated with positive curvature (e.g., BFGS) • use dual space QP solvers
+ easy to implement with existing sparse solvers, QP methods and line search techniques + exploits 'natural assignment' of dependent and decision variables (some decomposition steps are 'free') + does not require second derivatives
- reduced space matrices are dense - may be dependent on variable partitioning - can be very expensive for many degrees of freedom - can be expensive if many QP bounds
99 Reduced space SQP (rSQP) Range and Null Space Decomposition
Assume no active bounds, QP problem with n variables and m constraints becomes: k k d ∇ k W A = − f (x ) kT k A 0 λ+ c(x )
• Define reduced space basis, Zk∈ ℜ n x (n-m) with (Ak)TZk = 0 • Define basis for remaining space Yk∈ ℜ n x m, [Yk Zk]∈ℜn x n is nonsingular. k k • Let d = Y dY + Z dZ to rewrite: k k T k k k k dY k k T k []Y Z 0 W A []Y Z 0 []Y Z 0 ∇f (x ) = − kT dZ k 0 IA 0 0 I 0 I c(x ) λ+
100 Reduced space SQP (rSQP) Range and Null Space Decomposition 0 0 kT k k kT k k kT k Y W Y Y W Y Y A kT ∇ k dY Y f (x ) kT k k kT k k Z W Y Z W Z 0 = − kT ∇ k dZ Z f (x ) kT k k λ+ A Y 0 0 c(x )
T k • (A Y) dY =-c(x ) is square, dY determined from bottom row. T T • Cancel Y WY and Y WZ; (unimportant as dZ, dY --> 0) • (YTA) O = -YT∇f(xk), λ can be determined by first order estimate T T T∇ k • Calculate or approximate w= Z WY dY, solve Z WZ dZ =-Z f(x )-w • Compute total step: d = Y dY + Z dZ
101 Reduced space SQP (rSQP) Interpretation
dZ
dY
Range and Null Space Decomposition • SQP step (d) operates in a higher dimension
• Satisfy constraints using range space to get dY
• Solve small QP in null space to get dZ • In general, same convergence properties as SQP.
102 Choice of Decomposition Bases
1. Apply QR factorization to A. Leads to dense but well-conditioned Y and Z.
R R A = Q = []Y Z 0 0 2. Partition variables into decisions u and dependents v. Create orthogonal Y and Z with embedded identity matrices (ATZ = 0, YTZ=0). T = [∇ T ∇ T ]= [ ] A uc vc N C I T −T = = N C Z − Y − C 1N I
3. Coordinate Basis - same Z as above, YT = [ 0 I ] • Bases use gradient information already calculated. • Adapt decomposition to QP step • Theoretically same rate of convergence as original SQP. • Coordinate basis can be sensitive to choice of u and v. Orthogonal is not. • Need consistent initial point and nonsingular C; automatic generation
103 rSQP Algorithm
1. Choose starting point x0. 2. At iteration k, evaluate functions f(xk), c(xk) and their gradients. 3. Calculate bases Y and Z.
4. Solve for step dY in Range space from T k (A Y) dY =-c(x ) k T 5. Update projected Hessian B ~ Z WZ (e.g. with BFGS), wk (e.g., zero)
6. Solve small QP for step dZ in Null space. T ∇ k + k T + T k Min (Z f (x ) w ) dZ 2/1 dZ B dZ ≤ k + + ≤ ts .. xL x YdY ZdZ xU 7. If error is less than tolerance stop. Else 8. Solve for multipliers using (YTA) λ = -YT∇f(xk) 9. Calculate total step d = Y dY + Z dZ. α 10. Find step size and calculate new point, xk+1 = xk + 11. Continue from step 2 with k = k+1.
104 rSQP Results: Computational Results for General Nonlinear Problems Vasantharajan et al (1990)
105 rSQP Results: Computational Results for Process Problems Vasantharajan et al (1990)
106 Comparison of SQP and rSQP Coupled Distillation Example - 5000 Equations Decision Variables - boilup rate, reflux ratio
Method CPU Time Annual Savings Comments 1. SQP* 2 hr negligible Base Case 2. rSQP 15 min. $ 42,000 Base Case 3. rSQP 15 min. $ 84,000 Higher Feed Tray Location 4. rSQP 15 min. $ 84,000 Column 2 Overhead to Storage 5. rSQP 15 min $107,000 Cases 3 and 4 together
18
10
1
QVK QVK
107 Real-time Optimization with rSQP Sunoco Hydrocracker Fractionation Plant (Bailey et al, 1993) Existing process, optimization on-line at regular intervals: 17 hydrocarbon components, 8 heat exchangers, absorber/stripper (30 trays), debutanizer (20 trays), C3/C4 splitter (20 trays) and deisobutanizer (33 trays).
FUEL GAS REACTOR EFFLUENT FROM LOW PRESSURE SEPARATOR
STRIPPER C3 ABSORBER/ LIGHT NAPHTHA MIXED LPG
PREFLASH iC4 C3/C4 C3/C4 SPLITTER DIB
REFORMER MAIN FRAC. MAIN
NAPHTHA DEBUTANIZER
RECYCLE nC4 OIL
• square parameter case to fit the model to operating data.
• optimization to determine best operating conditions 108 Optimization Case Study Characteristics
Model consists of 2836 equality constraints and only ten independent variables. It is also reasonably sparse and contains 24123 nonzero Jacobian elements.
NP G E P = + + m − P ∑ziCi ∑ziCi ∑ziCi U i∈G i∈E m=1
Cases Considered: 1. Normal Base Case Operation 2. Simulate fouling by reducing the heat exchange coefficients for the debutanizer 3. Simulate fouling by reducing the heat exchange coefficients for splitter feed/bottoms exchangers 4. Increase price for propane 5. Increase base price for gasoline together with an increase in the octane credit
109 110 Many degrees of freedom=> full space IPOPT
k + Σ k d ∇ϕ k W A = − (x ) kT k A 0 λ+ c(x )
• work in full space of all variables • second derivatives useful for objective and constraints • use specialized large-scale Newton solver
∇ λ ∇ + W= xxL(x, ) and A= c(x) sparse, often structured + fast if many degrees of freedom present + no variable partitioning required
- second derivatives strongly desired - W is indefinite, requires complex stabilization - requires specialized large-scale linear algebra
111 *DVROLQH%OHQGLQJ*DVROLQH%OHQGLQJ V OLQH 3LSH 2,/7$1.6
*DVROLQH%OHQGLQJ+HUH *$667$7,216
6XSSO\WDQNV
,QWHUPHGLDWHWDQNV ),1$/352'8&7758&.6
)LQDO3URGXFWWDQNV
112 Blending Problem & Model Formulation
q ,it .. qt, j .. qt,k
f f ..f max ∑(∑c f − ∑ c f ) t,ij t, jk t,k k t, k i ,it qi vk t k i ⇒ s.t. ∑ f − ∑ f + v = v t, jk t,ij t + ,1 j t, j k i v f − ∑ f = 0 t,j t,k t, jk j ⇒ ∑q f − ∑q f + q v = q v t, j t, jk ,it t,ij t + ,1 j t + ,1 j t, j t, j k i q f − ∑q f = 0 t, k t, k t, j t, jk Supply tanks (i) Intermediate tanks (j)Final Product tanks (k) j q ≤ q ≤ q k t,k kmax min v ≤ v ≤ v j t, j jmax f & v ------flowrates and tank volumes min q ------tank qualities
Model Formulation in AMPL
113 SmallSmall MultiMulti--dayday BlendingBlending ModelsModels SingleSingle QualitiesQualities
Haverly, C. 1978 (HM) Audet & Hansen 1998 (AHM)
F 1 B 1 F 1 B 1 P 1 P 1 F 2 F 2 B 2
P 2 F 3 B 2 F 3 B 3
114 HoneywellHoneywell BlendingBlending ModelModel –– MultipleMultiple DaysDays 4848 QualitiesQualities
115 Summary of Results – Dolan-Moré plot
Performance profile (iteration count)
1
0.9
0.8
0.7
0.6
φ 0.5 IPOPT LOQO 0.4 KNITRO SNOPT 0.3 MINOS LANCELOT 0.2
0.1
0 1 10 100 1000 10000 100000 1000000 10000000 τ
116 Comparison of NLP Solvers: Data Reconciliation
1000 800 LANCELOT MINOS 600 SNOPT 400 KNITRO Iterations LOQO 200 IPOPT 0 0 200 400 600 Degrees of Freedom
100
10 LANCELOT MINOS SNOPT 1 KNITRO 0 200 400 600 LOQO 0.1 IPOPT CPU TimeCPU (s, norm.)
0.01 Degrees of Freedom
117 Sensitivity Analysis for Nonlinear Programming
At nominal conditions, p0
Min f(x, p0) s.t. c(x, p0) = 0 a(p0) x b(p0)
How is the optimum affected at other conditions, p p0? • Model parameters, prices, costs • Variability in external conditions • Model structure
• How sensitive is the optimum to parameteric uncertainties? • Can this be analyzed easily?
118 Calculation of NLP Sensitivity
Take KKT Conditions ∇L(x*, p, λ, v) = 0
c(x*, p0) = 0 T E x* - bnd( p0)=0 and differentiate and expand about p0. ∇ λ T ∇ λ T ∇ T ∇ λ T ∇ λT Ε ∇ T pxL(x*, p, , v) + xxL(x*, p, , v) px* + xh(x*, p, , v) p + pv = 0 ∇ T ∇ T ∇ T pc(x*, p0) + xc(x*, p0) px* = 0 T ∇ T ∇ T E ( px* - pbnd )=0 Notes: • A key assumption is that under strict complementarity, the active set will not change for small perturbations of p. • ∇ T ∇ T If an element of x* is at a bound then pxi* = pbnd • ∇ T Second derivatives are required to calculate sensitivities, px* • ∇ T Is there a cheaper way to calculate px* ?
119 DecompositionDecomposition forfor NLPNLP SensitivitySensitivity
Let L(x*, p,λ,v) = f (x*) + λT c(x*) + (ET (x*−bnd( p)))T v W A E∇ x*T ∇ L(x*, p,λ,v)T p xp T ∇ λT = − ∇ T A 0 0 p pc(x*, p) T ∇ T − T ∇ T E 0 0 pv E pbnd
•Partition variables into basic, nonbasic and superbasic ∇ T ∇ T ∇ T ∇ T px = Z pxS + Y pxB + T pxN ∇ T ∇ T •Set pxN = pbndN , nonbasic variables to rhs, •Substitute for remaining variables •Perform range and null space decomposition ∇ T ∇ T •Solve only for pxS and pxB
120 DecompositionDecomposition forfor NLPNLP SensitivitySensitivity
Y TWY Y TWY Y T A∇ x T Y T (∇ L(x*, p,λ,v)T +W T∇ x T ) p B xp p N T T ∇ T = − T ∇ λ T + ∇ T Z WY Z WZ 0 p xS Z ( xp L(x*, p, ,v) W T p xN ) T ∇ λT ∇ T + T ∇ T A Y 0 0 p pc(x*, p) A T p xN
∇ T ∇ T • Solve only for pxB from bottom row and pxS from middle row T T ∇ T • If second derivatives are not available, Z WZ, Z WY pxB and T ∇ T Z WT pxN can be constructed by directional finite differencing • If assumption of strict complementarity is violated, sensitivity can be calculated using a QP subproblem.
121 Second Order Tests Reduced Hessian needs to be positive definite
T∇ * At solution x*: Evaluate eigenvalues of Z xxL Z Strict local minimum if all positive.
• Nonstrict local minimum: If nonnegative, find eigenvectors for zero eigenvalues, Î regions of nonunique solutions • Saddle point: If any are negative, move along directions of corresponding eigenvectors and restart optimization.
x2
z 1
Saddle Point x* x1
z 2
122 Sensitivity for Flash Recycle Optimization (2 decisions, 7 tear variables)
S3
S1 S2 Mixer Flash P
S6 S4 S5 S7 Ratio
2 2 3 1/2 Max S3(A) *S3(B) - S3(A) - S3(C) + S3(D) - (S3(E))
•Second order sufficiency test: •Dimension of reduced Hessian = 1 •Positive eigenvalue •Sensitivity to simultaneous change in feed rate and upper bound on purge ratio •Only 2-3 flowsheet perturbations required for second order information
123 Ammonia Process Optimization (9 decisions, 8 tear variables)
20 Sensitivities vs. Re-optimized Pts
Reactor 19.5 Sensitivity
19 QP1
QP2 18.5 Actual Hi P Flash 18
Lo P Objective Function 17.5 Flash
17 0.1 0.01 •Second order sufficiency test: 0.001 Relative perturbation change •Dimension of reduced Hessian = 4 •Eigenvalues = [2.8E-4, 8.3E-10, 1.8E-4, 7.7E-5] •Sensitivity to simultaneous change in feed rate and upper bound on reactor conversion •Only 5-6 extra perturbations for second derivatives
124 Multiperiod Optimization
Coordination
Case 1 Case 2 Case 3 Case 4 Case N
1. Design plant to deal with different operating scenarios (over time or with uncertainty) 2. Can solve overall problem simultaneously • large and expensive • polynomial increase with number of cases • must be made efficient through specialized decomposition
3. Solve also each case independently as an optimization problem (inner problem with fixed design) • overall coordination step (outer optimization problem for design) • require sensitivity from each inner optimization case with design variables as external parameters
125 Multiperiod Flowsheet Example
i i C T1 A1
i CTA 0 0 V i i F i T F 2 2 1
A
W i i Tw Tw 2 1
Parameters Period 1 Period 2 Period 3 Period 4 E (kJ/mol) 555.6 583.3 611.1 527.8
k0 (1/h) 10 11 12 9 F (kmol/h) 45.4 40.8 24.1 32.6 Time (h) 1000 4000 2000 1000
126 Multiperiod Design Model Σ Min f0(d) + i fi(d, xi) s.t. hi(xi, d) = 0, i = 1,… N gi(xi, d) 0, i = 1,… N r(d) 0 Variables: x: state (z) and control (u) variables in each operating period d: design variables (e. g. equipment parameters) used δ δ i: substitute for d in each period and add i = d d
Σ hi, gi Min f0(d) + i fi(d, xi) δ s.t. hi(xi, i) = 0, i = 1,… N δ gi(xi, i) +si = 0, i = 1,… N δ 0 si, d – i=0, i = 1,… N r(d) 0
127 Multiperiod Decomposition Strategy SQP Subproblem
•Block diagonal bordered KKT matrix (arrowhead structure) •Solve each block sequentially (range/null dec.) to form small QP in space of d variables •Reassemble all other steps from QP solution
128 Multiperiod Decomposition Strategy From decomposition of KKT block in each period, obtain the following directions that are parametric in ∆d:
Substituting back into the original QP subproblem leads to a QP only in terms of ∆d. N minimize φ = ∇ f T + ∑ { ∇ f T (Z + Y ) + (Z + Y )T ∇2 L (Z + Y ) } ∆d d 0 p i Bi Bi Ai Ai p i Bi B i i =1 1 N + ∆d T ∇2 L + ∑ { (Z + Y )T ∇2 L (Z + Y ) } ∆d d 0 Bi Bi p i Bi B i 2 i =1 ∇ ∆ ≤ subject to r + dr d 0
Once ∆d is obtained, directions are obtained from the above equations.
129 Starting Point
Original QP Decoupling Multiperiod Decomposition Strategy Active Set Update
p steps are parametric in ∆d and their Null - Range i Decomposition components are created independently Calculate Decomposition linear in number of Reduced periods and trivially parallelizable Hessian Choosing the active inequality constraints can be done through: QP at d -Active set strategy (e.g., bound Calculate addition) Step -Interior point strategy using barrier terms in objective MINOR Bound Addition • Easy to implement in decomposition ITERATION LOOP Line search
Optimality NO Conditions YES Optimal Solution 130 Multiperiod Flowsheet 1 (13+2) variables and (31+4) constraints (1 period) 262 variables and 624 constraints (20 periods)
i i T CA 1 1
400 i CTA 0 0 V i i 300 F i T F 2 2 1 CPU time (s) SQP (T) 200 MINOS(T) MPD/SQP(T)
100 A
i W 0 0 10 20 30 i Periods Tw Tw 2 1
131 Multiperiod Example 2 – Heat Exchanger Network (12+3) variables and (31+6) constraints (1 period) 243 variables and 626 constraints (20 periods)
H H 1 2
i i 50 T T 1 5
i i T T 40 3 4 C A A 1 1 2 563 K
30 i i T T 2 6 CPU time (s) SQP (T) MINOS (T) 300 K MPD/SQP (T) i 20 T i 8 Q A C A 393 K c 4 2 3 10
i 320 K T 7 0 0 10 20 30 350 K Periods
132 Summary and Conclusions -Unconstrained Newton and Quasi Newton Methods -KKT Conditions and Specialized Methods -Reduced Gradient Methods (GRG2, MINOS) -Successive Quadratic Programming (SQP) -Reduced Hessian SQP -Interior Point NLP (IPOPT)
Process Optimization Applications -Modular Flowsheet Optimization -Equation Oriented Models and Optimization -Realtime Process Optimization -Blending with many degrees of freedom
Further Applications -Sensitivity Analysis for NLP Solutions -Multiperiod Optimization Problems 133 Optimization of Differential- Algebraic Equation Systems
L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA DAE Optimization Outline
I Introduction Process Examples II Parametric Optimization - Gradient Methods • Perturbation • Direct - Sensitivity Equations • Adjoint Equations III Optimal Control Problems - Optimality Conditions - Model Algorithms • Sequential Methods • Multiple Shooting • Indirect Methods IV Simultaneous Solution Strategies - Formulation and Properties - Process Case Studies - Software Demonstration
2 DynamicDynamic OptimizationOptimization ProblemProblem
Φ( ) min z(t), y(t), u(t), p, t f
dz t)( s.t. = f ( (tz ), (ty ),u(t),t, p) dt g(z(t), y(t),u(t),t, p)= 0
z o z ( 0 ) z l d z ( t ) d z u y l d y ( t ) d y u u l d u ( t ) d u u p l d p d p u
t, time tf, final time z, differential variables u, control variables y, algebraic variables p, time independent parameters 3 DAE Models in Process Engineering
Differential Equations •Conservation Laws (Mass, Energy, Momentum)
Algebraic Equations •Constitutive Equations, Equilibrium (physical properties, hydraulics, rate laws) •Semi-explicit form •Assume to be index one (i.e., algebraic variables can be solved uniquely by algebraic equations) •If not, DAE can be reformulated to index one (see Ascher and Petzold)
Characteristics •Large-scale models – not easily scaled •Sparse but no regular structure •Direct linear solvers widely used •Coarse-grained decomposition of linear algebra
4 Parameter Estimation Catalytic Cracking of Gasoil (Tjoa, 1991)
A →p1 Q, Q →p2 S, A →p3 S = − + 2 1.0 a ( p1 p3 )a = − 2 − 0.8 q p1a p2q = = 0.6 YA_data a )0( ,1 q )0( 0 YQ_data Yi 0.4 YA_estimate number of states and ODEs: 2 YQ_estimate number of parameters:3 0.2 no control profiles 0.0 0.0 0.2 0.4 0.6 0.8 1.0 constraints: pL p pU t
Objective Function: Ordinary Least Squares
0 (p1, p2, p3) = (6, 4, 1) (p1, p2, p3)* = (11.95, 7.99, 2.02) (p1, p2, p3)true = (12, 8, 2)
5 Batch Distillation Multi-product Operating Policies
5XQEHWZHHQGLVWLOODWLRQEDWFKHV 7UHDWDVERXQGDU\YDOXHRSWLPL]DWLRQSUREOHP :KHQWRVZLWFKIURP$WRRIIFXW WR%" +RZPXFKRIIFXW WRUHF\FOH" 5HIOX[" %RLOXS 5DWH" 2SHUDWLQJ7LPH"
$ %
6 Nonlinear Model Predictive Control (NMPC)
NMPC Estimation and Control
z : differential states Why NMPC? d : disturbances y : algebraic states Process Track a profile Severe nonlinear dynamics (e.g, sign changes in gains) u : manipulated variables NMPC Controller Operate process over wide range z′ = F(z, y,u, p,d ) (e.g., startup and shutdown) 0 = G()z, y,u, p,d
ysp : set points Model Updater ′ ( ) z = F z, y,u, p,d NMPC Subproblem 0 = G()z, y,u, p,d − sp + k − k −1 min ∑|| y(t) y || y ∑|| u(t ) u(t ||) u u Q Q z′(t) = F (z(t), y(t), u(t),t) ts .. 0 = G(z(t), y(t), u(t),t) z(t) = z init Bound Constraint s Other Constraint s 7 Batch Process Optimization Optimization of dynamic batch process operation resulting from reactor and distillation column DAE models: A+B→C z’ = f(z, y, u, p) C+B→P+E g(z, y, u, p) = 0 P+C→G number of states and DAEs: n + n z y z0 z0 0 z0 B parameters for equipment design i,I i,II zi,III i,IV i zf (reactor, column) f f f i,IV zi,I zi,II zi,III nu control profiles for optimal operation
Constraints: uL u(t) uU zL z(t) zU
yL y(t) yU pL p pU Objective Function: amortized economic function at end of cycle time tf 640 25 C o nsta nt C o nsta nt 630 Dyn a m ic Dyn a m ic 20 620
610 15
600 10 590
580 5 0 0.25 0.5 0.75 1 1.25 0 0.5 1 1.5 2 2.5 Tim e (h r.) optimal reactor temperature policy optimal column reflux ratio 8 Reactor Design Example Plug Flow Reactor Optimization The cracking furnace is an important example in the olefin production industry, where various hydrocarbon feedstocks react. Consider a simplified model for ethane cracking (Chen et al., 1996). The objective is to find an optimal profile for the heat flux along the reactor in order to maximize the production of ethylene. CH2 4 Max F exit s.t. DAE ≤ Texit 1180K The reaction system includes six molecules, three free radicals, and seven reactions. The model also includes the heat balance and the pressure drop equation. This gives a total of eleven differential equations. C H→2 CH • CH•+ C H → CH + C H • 2 6 3 3 2 6 4 2 5 HCHHCH•+ → + • C HCHH•→ + • 2 6 2 2 5 2 5 2 4 C H•+ CH → CH + CH • 2CHCH•→ 2 5 2 4 3 6 3 2 5 4 10 HCHCH•+ → • 2 4 2 5
Concentration and Heat Addition Profile
6 2500 5 2000 4 1500 3 1000 2 1 500 Flow rate mol/s Heat flux kJ/m2s 0 0 0 2 4 6 8 10 Length m
C2H4 C2H6 log(H2)+12 q 9 Dynamic Optimization Approaches
Pontryagin(1962)
DAE Optimization Problem Variational Approach
Inefficient for constrained problems Discretize Vassiliadis(1994) controls Apply a NLP solver Sequential Approach
Efficient for constrained problems
10 Sequential Approaches - Parameter Optimization
Consider a simpler problem without control profiles: e.g., equipment design with DAE models - reactors, absorbers, heat exchangers Φ Min (z(tf))
z' = f(z, p), z (0) = z0
g(z(tf)) 0, h(z(tf)) = 0 By treating the ODE model as a "black-box" a sequential algorithm can be constructed that can be treated as a nonlinear program. NLP Solver
P
φ,g,h ODE Gradient Model z (t) Calculation
Task: How are gradients calculated for optimizer?
11 Gradient Calculation
Perturbation Sensitivity Equations Adjoint Equations
Perturbation Calculate approximate gradient by solving ODE model (np + 1) times ψ = Φ Let , g and h (at t = tf) ψ ψ ψ d /dpi = { (pi + ¨pi) - (pi)}/ ¨pi Very simple to set up Leads to poor performance of optimizer and poor detection of optimum unless roundoff error (O(1/¨pi) and truncation error (O(¨pi)) are small. Work is proportional to np (expensive)
12 Direct Sensitivity
∂ From ODE model: {}z′ = f (z, p,t), z )0( = z ( p) ∂p 0 ∂z(t) define s (t) = i = 1, ...np i ∂ pi d ∂f ∂f T ∂z )0( s′ = (s ) = + s , s )0( = i i ∂ ∂ i i ∂ dt pi z pi (nz x np sensitivity equations)
• z and si , i = 1,…np, an be integrated forward simultaneously.
• for implicit ODE solvers, si(t) can be carried forward in time after converging on z • linear sensitivity equations exploited in ODESSA, DASSAC, DASPK, DSL48s and a number of other DAE solvers Sensitivity equations are efficient for problems with many more constraints than parameters (1 + ng + nh > np)
13 Example: Sensitivity Equations
′ = 2 + 2 z1 z1 z2 ′ = + z2 z1 z2 z1 pb = = z1 ,5 z2 )0( pa = ∂ ∂ = ∂ ∂ = s(t)a, j z(t) j / pa , s(t)b, j z(t) j / pb , j 2,1 ′ = + sa 1, 2z1 sa 1, 2z2 sa 2, ′ = + + sa 2, z1 sa 2, z2 sa 1, sa 1, pb = = sa 1, ,0 sa 2, )0( 1
′ = + sb 1, 2z1 sb 1, 2z2 sb 2, ′ = + + + sb 2, z1 z1 sb 2, z2 sb 1, sb 1, pb = = sb 1, ,0 sb 2, )0( 0
14 Adjoint Sensitivity
Adjoint or Dual approach to sensitivity Adjoin model to objective function or constraint t ψ Φ f ( = ,g or h) ψ =ψ − λT ′ − (t f ) ∫ (z f (z, p,t))dt 0 t f ψ =ψ + λ T − λ T + T λ′ + λT (t f ) )0( z0 ( p) (t f ) z(t f ) ∫ z F(z, p,t))dt 0 (λ(t)) serve as multipliers on ODE’s) Now, integrate by parts 0 0 T T ∂ψ (z(t )) ∂ t f ∂ T ∂ ψ = f − λ δ + z0 ( p) λ + λ′ + f λ δ + f λ d (t f ) z(t f ) )0( dp ∫ z(t) dp dt ∂ ∂ ∂ ∂ z(t f ) p 0 z p
and find dψ/dp subject to feasibility of ODE’s Now, set all terms not in dp to zero.
15 Adjoint System ∂f ∂ψ (z(t )) λ′ = − λ(t), λ(t ) = f ∂ f ∂ z z(t f ) t dψ ∂z ( p) f ∂f = 0 λ )0( + λ(t) dt ∂ ∫ ∂ dp p 0 p Integrate model equations forward Integrate adjoint equations backward and evaluate integral and sensitivities. Notes: nz (ng + nh + 1) adjoint equations must be solved backward (one for each objective and constraint function) for implicit ODE solvers, profiles (and even matrices) can be stored and carried backward after solving forward for z as in DASPK/Adjoint (Li and Petzold) more efficient on problems where: np > 1 + ng + nh
16 Example: Adjoint Equations
′ = 2 + 2 z1 z1 z2 ′ = + z1 z1 z2 z1 pb = = z1 ,5 z2 )0( pa
λT = λ 2 + 2 + λ + Form f (z, p,t) 1(z1 z2 ) 2 (z1 z2 z1 pb ) ∂f ∂ψ (z(t )) λ′ = − λ(t), λ(t ) = f ∂ f ∂ z z(t f ) t dψ ∂z ( p) f ∂f = 0 λ )0( + λ(t) dt ∂ ∫ ∂ dp p 0 p then becomes: ∂ψ (t ) λ′ = −2λ z − λ (z + p ), λ (t ) = f 1 1 1 2 2 b 1 f ∂ z1(t f ) ∂ψ (t ) λ′ = −2λ z − λ z , λ (t ) = f 2 1 2 2 1 2 f ∂ z2 (t f )
dψ (t ) f = λ 1 )0( dpa dψ (t ) t f f = λ ∫ 2 (t)z1(t)dt dpb 0 17 Example: Hot Spot Reactor L Φ = − − Min L ∫ (T (t) TS /TR )dt T ,T ,L,T P R S 0 dq ts .. = 1(3.0 − q(t))exp[20 − 20 /T (t)], q )0( = 0 dt dT dq = − (5.1 T (t) −T /T ) + 3/2 , T )0( =1 dt S R dt o ∆ = + feed(TR , 110 C) - H product(TP , T(L)) 0 = o = + o TP 120 C, T(L) 1 10 C/TR
TP = specified product temperature TR = reactor inlet, reference temperature T R 3:1 B/A A + 3B --> C + 3D L = reactor length 383 K L Ts = steam sink temperature q(t) = reactor conversion profile T P T s T(t) = normalized reactor temperature profile
Cases considered: • Hot Spot - no state variable constraints • Hot Spot with T(t) 1.45
18 Hot Spot Reactor: Unconstrained Case
Method: SQP (perturbation derivatives)
L(norm) TR(K) TS(K) TP(K) Initial: 1.0 462.23 425.26 250 Optimal: 1.25 500 470.1 188.4 13 SQP iterations / 2.67 CPU min. (µVax II)
1.6 1.2
1.5 1.0
1.4 0.8
1.3 0.6
1.2 0.4 Conversion, q
1.1 0.2 Normalized Temperature
1.0 0.0 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 Normalized Length Normalized Length Constrained Temperature Case: could not be solved with sequential method 19 Tricks to generalize classes of problems
Variable Final Time (Miele, 1980) τ 0 ≤ τ ≤ Define t = pn+1 , 1, pn+1 = tf
Let dz/dt = (1/ pn+1) dz/dτ = f(z, p) ⇒ dz/dτ = (pn+1) f(z, p)
Converting Path Constraints to Final Time
Define measure of infeasibility as a new variable, znz+1(t) (Sargent & Sullivan, 1977):
t f = 2 znz+1(t f ) ∑ ∫ max( ,0 g j (z(t),u(t)) dt j 0 = 2 = or znz+1(t) ∑ max( ,0 g j (z(t),u(t)) , znz+1 )0( 0 j ≤ ε Enforce znz+1(t f ) (however, constraint is degenerate)
20 Profile Optimization - (Optimal Control)
Optimal Feed Strategy (Schedule) in Batch Reactor Optimal Startup and Shutdown Policy Optimal Control of Transients and Upsets
Sequential Approach: Approximate control profile as through parameters (piecewise constant, linear, polynomial, etc.) Apply NLP to discretization as with parametric optimization Obtain gradients through adjoints (Hasdorff; Sargent and Sullivan; Goh and Teo) or sensitivity equations (Vassiliadis, Pantelides and Sargent; Gill, Petzold et al.)
Variational (Indirect) approach: Apply optimality conditions and solve as boundary value problem
21 Derivation of Variational Conditions Indirect Approach
Optimality Conditions (Bound constraints on u(t)) Min φ(z(tf))
s.t. dz/dt = f(z, u), z (0) = z0 g (z(tf)) 0 h (z(tf)) = 0 a u(t) b Form Lagrange function - adjoin objective function and constraints:
φ = φ + T µ + T (t f ) g ( z (t f )) h( z (t f )) v t − f λT − + α T − +α T − ∫ ( z f ( z, u )) a (a u (t )) b (u (t ) b) dt 0 Integrate by parts : φ = φ + T µ + T + λT − λT (t f ) g ( z (t f )) h( z (t f )) v )0( z )0( (t f ) z (t f ) t + f λT + λT + α T − +α T − ∫ z f ( z, u ) a (a u (t )) b (u (t ) b) dt 0 22 Derivation of Variational Conditions ∂φ ∂ ∂ T δφ = + g µ + h − λ δ + λT δ v z (t f ) )0( z )0( ∂z ∂z ∂z T T t ∂ ∂ + f λ + f ( z, u ) λ δ + f ( z, u ) λ + α − α δ ≥ ∫ z (t ) b a u (t ) dt 0 0 ∂z ∂u At optimum, δφ ≥ 0. Since u is the control variable, let all other terms vanish. ⇒ δz(tf): ∂φ ∂g ∂h λ()t = + µ + γ f ∂z ∂z ∂z t=t f δz(0): λ(0) = 0 (if z(0) is not specified) δ ∂ ∂ z(t): λ = − H = − f λ ∂z ∂z Define Hamiltonian, H = λTf(z,u) α T (a − u(t)) For u not at bound: ∂f ∂H a λ = = 0 α T (u(t) − b) ∂u ∂u b u ≤ u(t) ≤ u ∂ a b H α ≥ α ≥ For u at bounds: = α − α a 0, b 0 ∂u a b ∂ ∂H H = α ≥ = −α ≤ Lower bound, u(t) = a, a 0 Upper bound, u(t) = b, b 0 ∂u ∂u 23 Car Problem Travel a fixed distance (rest-to-rest) in minimum time. Min x (t ) Min t 3 f f ts .. x ’= x ts .. x"= u 1 2 x ’= u a ≤ u(t) ≤ b 2 x ’=1 x )0( = ,0 x(t ) = L 3 f a ≤ u(t) ≤ b = = x )0(’ ,0 x (’ t f ) 0 = = x1 )0( ,0 x1(t f ) L = = x2 )0( ,0 x2 (t f ) 0 = λ + λ + λ Hamiltonia :n H 1x2 2u 3 λ = ==> λ = Adjoints : 1 0 1(t) c1 λ = −λ ==> λ = + − 2 1 2 (t) c2 c1(t f t) λ = ==> λ = λ = 3 0 3 (t f ) ,1 3 (t) 1 = + > = ∂H t ,0 1tc f c2 ,0 u b = λ = c + c (t − t) ∂ 2 2 1 f = > = u t t f ,c2 ,0 u a λ = = Crossover ( 2 )0 occurs at t ts 24 Car Problem Analytic Variational Solution b
Optimal Profile u(t) From state equations: 2 1 / 2 bt ,t < ts x1(t) = 2 2 ≥ 1 / 2() bts - a() ts - tf , t ts a x (t) = bt, t < ts 2 t s ≥ t f bts + a() t - ts , t ts Apply boundary conditions at t = tf: 2 2 x1(tf) = 1/2 (b ts - a (ts - tf) ) = L •Problem is linear in u(t). Frequently x2(tf) = bts + a (tf - ts) = 0 1/2 these problems have "bang-bang" ⇒ ts = 2L () character. b 1- b / a 1/ 2 •For nonlinear and larger problems, the 2L tf = (1 − b / a) variational conditions can be solved b() 1 - b / a numerically as boundary value problems.
25 Example: Batch reactor - temperature profile
Maximize yield of B after one hour’s operation by manipulating a transformed temperature, u(t). ⇒ Minimize -zB(1.0) s.t. z’A = -(u+u2/2) zA z’B = u zA u AB zA(0) = 1 2 zB(0) = 0 u /2 C 0 u(t) 5 u(T(t)) Adjoint Equations: λ 2 λ H = - A(u+u /2) zA + B u zA ∂ ∂ λ λ H/ u = A (1+u) zA + B zA λ λ λ λ ’A = A(u+u2/2) - B u, A(1.0) = 0 λ λ ’B = 0, B(1.0) = -1 Cases Considered 1. NLP Approach - piecewise constant and linear profiles. 2. Control Vector Iteration
26 Batch Reactor Optimal Temperature Program Piecewise Constant
6
4
2 Optimal Optimal Profile, u(t)
0. 0.2 0.4 0.6 0.8 1.0 Time, h Results Piecewise Constant Approximation with Variable Time Elements Optimum B/A: 0.57105
27 Batch Reactor Optimal Temperature Program Piecewise Linear
6
4
2 Optimal Profile, u(t)
0. 0.2 0.4 0.6 0.8 1.0 Time, h Results: Piecewise Linear Approximation with Variable Time Elements Optimum B/A: 0.5726 Equivalent # of ODE solutions: 32
28 Batch Reactor Optimal Temperature Program Indirect Approach
6
4
2 Optimal Profile, u(t) Profile, Optimal
0. 0.2 0.4 0.6 0.8 1.0 Time, h Results: Control Vector Iteration with Conjugate Gradients Optimum (B/A): 0.5732 Equivalent # of ODE solutions: 58
29 Dynamic Optimization - Sequential Strategies
Small NLP problem, O(np+nu) (large-scale NLP solver not required) • Use NPSOL, NLPQL, etc. • Second derivatives difficult to get Repeated solution of DAE model and sensitivity/adjoint equations, scales with nz and np • Dominant computational cost • May fail at intermediate points Sequential optimization is not recommended for unstable systems. State variables blow up at intermediate iterations for control variables and parameters. Discretize control profiles to parameters (at what level?) Path constraints are difficult to handle exactly for NLP approach
30 Instabilities in DAE Models
This example cannot be solved with sequential methods (Bock, 1983):
dy1/dt = y2 τ2 π2 − τ2 π dy2/dt = y1 + ( ) sin ( t) The characteristic solution to these equations is given by: π τ τ y1(t) = sin ( t) + c1 exp(- t) + c2 exp( t) π π τ τ τ τ y2 (t) = cos ( t) - c1 exp(- t) + c2 exp( t)
Both c1 and c2 can be set to zero by either of the following equivalent conditions: π IVP y1(0) = 0, y2 (0) =
BVP y1(0) = 0, y1(1) = 0
31 IVP Solution
If we now add roundoff errors e1 and e2 to the IVP and BVP conditions, we see significant differences in the sensitivities of the solutions. For the IVP case, the sensitivity to the analytic solution profile is seen by large changes in the profiles y1(t) and y2(t) given by:
π τ τ y1(t) = sin ( t) + (e1 - e2/ ) exp(- t)/2 τ τ +(e1 + e2/ ) exp( t)/2
π π τ τ y2 (t) = cos ( t) - ( e1 - e2) exp(- t)/2 τ τ + ( e1 + e2) exp( t)/2
-13 Therefore, even if e1 and e2 are at the level of machine precision (< 10 ), a large value of τ and t will lead to unbounded solution profiles.
32 BVP Solution
On the other hand, for the boundary value problem, the errors affect the analytic solution profiles in the following way: π τ τ τ τ y1(t) = sin ( t) + [e1 exp( )- e2] exp(- t)/[exp( ) - exp(- )] τ τ τ τ + [e1 exp(- ) - e2] exp( t)/[exp( ) - exp(- )] π π τ τ τ τ τ y2(t) = cos ( t) – [e1 exp( )- e2] exp(- t)/[exp( ) - exp(- )] τ τ τ τ τ + [e1 exp(- ) - e2] exp( t)/[exp( ) - exp(- )]
Errors in these profiles never exceed t (e1 + e2), and as a result a solution to the BVP is readily obtained.
33 BVP and IVP Profiles
-9 e1, e2 = 10 Linear BVP solves easily IVP blows up before midpoint
34 DynamicDynamic OptimizationOptimization ApproachesApproaches
Pontryagin(1962)
DAE Optimization Problem Variational Approach
Inefficient for constrained problems Discretize Vassiliadis(1994) controls Apply a NLP solver Sequential Approach
Efficient for constrained problems Can not handle instabilities properly Small NLP Discretize some state variables Multiple Shooting
Handles instabilities Larger NLP
35 Multiple Shooting for Dynamic Optimization Divide time domain into separate regions
Integrate DAEs state equations over each region
Evaluate sensitivities in each region as in sequential approach wrt uij, p and zj Impose matching constraints in NLP for state variables over each region Variables in NLP are due to control profiles as well as initial conditions in each region
36 Multiple Shooting Nonlinear Programming Problem
ψ ( ) min z(t f ), y(t f ) ui , j , p s.t. − = z(z j ,u i, j , p, t j +1 ) z j +1 0 l ≤ ≤ u z k z(z j , u i, j , p, t k ) z k y l ≤ y(z , u , p,t ) ≤ y u min f (x) k j i, j k k x∈ℜn l ≤ ≤ u u i u i, j u i p l ≤ p ≤ p u s.t c(x) = 0 dz = () = f z, y,u , ji , p , z(tj ) zj dt xL ≤ x ≤ xu ( )= g z,y,u ji, , p 0 o z = z (0) 0 Solved Implicitly
37 BVP Problem Decomposition
IC
B1 A1
B2 A2
B3 A3
B4 A4
BN AN
FC
Consider: Jacobian of Constraint Matrix for NLP • bound unstable modes with boundary conditions (dichotomy) • can be done implicitly by determining stable pivot sequences in multiple shooting constraints approach • well-conditioned problem implies dichotomy in BVP problem (deHoog and Mattheij) Bock Problem (with t = 50) • Sequential approach blows up (starting within 10-9 of optimum) • Multiple Shooting optimization requires 4 SQP iterations 38 Dynamic Optimization – Multiple Shooting Strategies
Larger NLP problem O(np+nu+NE nz) • Use SNOPT, MINOS, etc. • Second derivatives difficult to get Repeated solution of DAE model and sensitivity/adjoint equations, scales with nz and np • Dominant computational cost • May fail at intermediate points Multiple shooting can deal with unstable systems with sufficient time elements. Discretize control profiles to parameters (at what level?) Path constraints are difficult to handle exactly for NLP approach Block elements for each element are dense! Extensive developments and applications by Bock and coworkers using MUSCOD code
39 DynamicDynamic OptimizationOptimization ApproachesApproaches
Pontryagin(1962)
DAE Optimization Problem Variational Approach
Inefficient for constrained problems Discretize Vassiliadis(1994) controls Apply a NLP solver Sequential Approach
Efficient for constrained problems Can not handle instabilities properly Small NLP Discretize all state variables Simultaneous Approach
Handles instabilities Large NLP
40 Nonlinear Programming Formulation
Nonlinear Dynamic Continuous variables Optimization Problem Continuous variables
Collocation on finite Elements
Nonlinear Programming Discretized variables Problem (NLP)
41 Discretization of Differential Equations Orthogonal Collocation
Given: dz/dt = f(z, u, p), z(0)=given
Approximate z and u by Lagrange interpolation polynomials (order
K+1 and K, respectively) with interpolation points, tk K K (t − t ) = " " = ∏ j ==> = zK +1(t) ∑ zk k (t ,) k (t) zN +1(tk ) zk = − k =0 j 0 (t t ) j≠k k j K K (t − t ) = " " = ∏ j ==> = uK (t) ∑uk k (t ,) k (t) uN (tk ) uk = − k=1 j 1 (t t ) j≠k k j
Substitute zN+1 and uN into ODE and apply equations at tk. K = " − = = r(tk ) ∑ z j j (tk ) f (zk ,uk ) ,0 k 1,...K j=0
42 Collocation Example
K K (t − t ) = " " = ∏ j ==> = zK +1(t) ∑ zk k (t ,) k (t) zN +1(tk ) zk = − k =0 j 0 (t t ) j≠k k j = = = t 0 0 , t 1 0.21132 , t 2 0.78868
" = 2 + " = 0(t) (t - t 1 )/ ,6 0(t) t/ 3 - 1/ 6 " = + " = 1(t) -8.195 t 2 6 .4483 t , 1(t) 6 .4483 - 16 .39 t " = " = 2(t) 2.19625 t 2 - 0 .4641 t , 2(t) 4 .392 t - 0 .46412
Solve z' = z 2 - 3 z + 2 , z( 0 ) = 0 ==> = z 0 0 " + " + " = 2 + z 0 0(t 1 ) z 1 1(t 1 ) z 2 2(t 1 ) z 1 - 3 z 1 2 + = 2 + ( 2 .9857 z 1 0 .46412 z 2 z 1 - 3 z 1 2 ) " + " + " = 2 + z 0 0(t 2 ) z 1 1(t 2 ) z 2 2(t 2 ) z 2 - 3 z 2 2 + = 2 + (- 6 .478 z 1 3 z 2 z 2 - 3 z 2 2 ) = = = z 0 0 , z 1 0 .291 ( 0 .319 ), z 2 0 .7384 ( 0 .706 ) z(t) = 1.5337 t - 0.76303 t 2 43 Converted Optimal Control Problem Using Collocation
Min φ(z(tf))
s.t. z' = f(z, u, p), z(0)=z0 g(z(t), u(t), p) 0 z N+1(t)
h(z(t), u(t), p) = 0 z(t) to Nonlinear Program StateProfile φ Min (z f ) K " − = = ∑ z j j (tk ) f (zk ,uk ) ,0 z0 z(0) = j 0 g(z ,u ) ≤ 0 k =1,...K k k t f t 1 t 2 t 3 = h(zk ,uk ) 0 r(t) K " − = ∑ z j j )1( z f 0 t 3 j=0 t 1 t 2
How accurate is approximation
44 Results of Optimal Temperature Program Batch Reactor (Revisited)
Results - NLP with Orthogonal Collocation Optimum B/A - 0.5728 # of ODE Solutions - 0.7 (Equivalent)
45 CollocationCollocation onon FiniteFinite ElementsElements Polynomials i−1 = + τ τ ∈ tij ∑hi’ hi j , ]1,0[ = • • • • i’ 1 • • • • • • dz = 1 dz • • τ • dt hi d uuu u u u u u u u u u dz = hi f (z,u) t t tf dτ o i Collocation points hi Mesh points Discontinuous Algebraic and Finite element, i Control variables Continuous Differential variables u u u u u u element i q = 2 u u u q = 1 u u u K K = " = " y(t) ∑ q(t) yiq u(t) ∑ q(t) uiq K = = = " q 1 q 1 z(t) ∑ q(t) ziq q=0 K = " τ − = = = r(tik ) ∑(zij j ( k )) hi f (zik ,uik , p) ,0 k 1,..K, i 1,.. NE j=0
46 NonlinearNonlinear ProgrammingProgramming ProblemProblem
min ψ (z ) f min f (x) ∈ℜn K x s.t. " τ − = ∑(zij j ( k )) hi f (zik ,uik , p) 0 j=0 s.t c(x) = 0 ( )= g zi,k , yi,k ,ui,k , p 0
K " − = = L u ∑(zi− ,1 j j 1( )) zi0 ,0 i 2,..NE x ≤ x ≤ x j=0 K " − = = ∑(zNE, j j 1( )) z f ,0 z10 z )0( j=0
l u Finite elements, h , can also be variable to z ≤ z ≤ z i ji, i, j ji, determine break points for u(t). y l ≤ y ≤ y u i, j i, j i, j Σ Add hu hi 0, hi=tf u l ≤ u ≤ u u i, j i, j i, j Can add constraints g(h, z, u) ε for p l ≤ p ≤ p u approximation error 47 Hot Spot Reactor Revisited L Φ = − − Min L ∫ (T (t) TS /TR )dt T ,T ,L,T P R S 0 dq ts .. = 1(3.0 − q(t))exp[20 − 20 /T (t)], q )0( = 0 dt dT dq = − (5.1 T (t) −T /T ) + 3/2 , T )0( =1 dt S R dt o ∆ = + feed(TR , 110 C) - H product(TP , T(L)) 0 = o = + o TP 120 C, T(L) 1 10 C/TR
TP = specified product temperature TR = reactor inlet, reference temperature T R 3:1 B/A A + 3B --> C + 3D L = reactor length 383 K L Ts = steam sink temperature q(t) = reactor conversion profile T P T s T(t) = normalized reactor temperature profile
Cases considered: • Hot Spot - no state variable constraints • Hot Spot with T(t) 1.45
48 Base Case Simulation
Method: OCFE at initial point with 6 equally spaced elements L(norm) TR(K) TS(K) TP(K) Base Case: 1.0 462.23 425.26 250
2 1.8
integrated profile integrated profile collocation collocation 1.6
1 1.4 Conversion Temperature 1.2
0 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
Normalized Length Normalized Length
49 Unconstrained Case Method: OCFE combined formulation with rSQP identical to integrated profiles at optimum L(norm) TR(K) TS(K) TP(K) Initial: 1.0 462.23 425.26 250 Optimal: 1.25 500 470.1 188.4
123 CPU s. (µVax II) φ∗ = -171.5
1.2 1.6
1.0 1.5
0.8 1.4
0.6 1.3
0.4 1.2 Conversion, q Conversion,
0.2 1.1 Normalized Temperature 0.0 1.0 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 Normalized Length Normalized Length
50 Temperature Constrained Case T(t) 1.45
Method: OCFE combined formulation with rSQP, identical to integrated profiles at optimum
L(norm) TR(K) TS(K) TP(K) Initial: 1.0 462.23 425.26 250 Optimal: 1.25 500 450.5 232.1
57 CPU s. (µVax II), φ∗ = -148.5
1.2 1.5
1.0 1.4 0.8 1.3 0.6 1.2
Conversion 0.4 Temperature
0.2 1.1
0.0 1.0 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 Normalized Length Normalized Length
51 Theoretical Properties of Simultaneous Method
A. Stability and Accuracy of Orthogonal Collocation
• Equivalent to performing a fully implicit Runge-Kutta integration of the DAE models at Gaussian (Radau) points • 2K order (2K-1) method which uses K collocation points • Algebraically stable (i.e., possesses A, B, AN and BN stability)
B. Analysis of the Optimality Conditions
• An equivalence has been established between the Kuhn-Tucker conditions of NLP and the variational necessary conditions • Rates of convergence have been established for the NLP method
52 Simultaneous DAE Optimization
Case Studies • Reactor - Based Flowsheets • Fed-Batch Penicillin Fermenter • Temperature Profiles for Batch Reactors • Parameter Estimation of Batch Data • Synthesis of Reactor Networks • Batch Crystallization Temperature Profiles • Grade Transition for LDPE Process • Ramping for Continuous Columns • Reflux Profiles for Batch Distillation and Column Design • Source Detection for Municipal Water Networks • Air Traffic Conflict Resolution • Satellite Trajectories in Astronautics • Batch Process Integration • Optimization of Simulated Moving Beds
53 Production of High Impact Polystyrene (HIPS) Startup and Transition Policies (Flores et al., 2005a)
Monomer, Transfer/Term. agents
Coolant
Catalyst Polymer
54 Phase Diagram of Steady States
Transitions considered among all steady state pairs
600
650 1: Qi = 0.0015 3 A5 2: Qi = 0.0025 3: Qi = 0.0040Upper Steady−State
550 N3 600 System State
2
550 500 1 Medium Steady−State
500 450
1: Qcw = 10 450 2: Qcw = 1.0 Temperature (K) Temperature Temperature (K) 3: Qcw = 0.1 1 400 N2 A1 Lower Steady−State 2 400 A2 N2
3 350 A3 350 C1 A4 N1 Bifurcation3 ParameterB1 C2 2 B2 B3 1
300 300
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.02 0.04 0.06 0.08 0.1 0.12 Cooling water flowrate (L/s) Initiator flowrate (L/s) 55 Startup to Unstable Steady State
−3 x 10 1 10
8 0.5 6
0 4 Initiator Conc. [mol/l] 0 0.5 1 1.5 2 0 0.5 1 1.5 2 Monomer Conc. [mol/l] 400 Time [h] 320 Time [h]
310 350 300
300 Jacket Temp. [K] 290 Reactor Temp. [K] 0 −3 0.5 1 1.5 2 0 0.5 1 1.5 2 x 10 1.5 Time [h] 1 Time [h]
1 0.5 0.5
0 0 Initiator Flow. [l/sec] 0 20 40 60 80 100 0 0.5 1 1.5 2
3 Time [h] Cooling water Flow. [l/sec] Time [h]
2 • 926 variables
1 • 476 constraints 0 • 36 iters. / 0.95 CPU s (P4) Feedrate Flow. [l/sec] 0 0.5 1 1.5 2 Time [h] 56 HIPS Process Plant (Flores et al., 2005b)
•Many grade transitions considered with stable/unstable pairs •1-6 CPU min (P4) with IPOPT •Study shows benefit for sequence of grade changes to achieve wide range of grade transitions.
57 Batch Distillation – Optimization Case Study - 1
R(t) D(t), x d dxi,d V [ ] = yi,N - xi,d dt Hcond
dxi,0 V = [] x - x dt S i,d i,0 dS - V = V(t) dt R +1 x b
*DXJHHIIHFWRIFROXPQKROGXSV 2YHUDOOSURILWPD[LPL]DWLRQ 0DNH7UD\&RXQW&RQWLQXRXV
58 2SWLPL]DWLRQ&DVH6WXG\
0RGHOLQJ$VVXPSWLRQV
,GHDO7KHUPRG\QDPLFV 1R/LTXLG7UD\+ROGXS 1R9DSRU+ROGXS )RXUFRPSRQHQWPL[WXUH α 6KRUWFXWVWHDG\VWDWHWUD\PRGHO )HQVNH8QGHUZRRG*LOOLODQG &DQEHVXEVWLWXWHGE\PRUHGHWDLOHGVWHDG\VWDWHPRGHOV )UHGHQVOXQG DQG*DOLQGH]$OND\D
2SWLPL]DWLRQ3UREOHPV&RQVLGHUHG
(IIHFWRI&ROXPQ+ROGXS +FRQG 7RWDO3URILW0D[LPL]DWLRQ 59 0D[LPXP'LVWLOODWH3UREOHP
30
20
With Holdup No Holdup 10 Reflux Ratio Reflux &RPSDULVRQRIRSWLPDOUHIOX[SURILOHV ZLWKDQGZLWKRXWKROGXS +FRQG 0 0.0 0.2 0.4 0.6 0.8 1.0 Time ( hours )
0.98
0.97 &RPSDULVRQRIGLVWLOODWH 0.96 0.95 SURILOHVZLWKDQGZLWKRXW With Holdup 0.94 No Holdup KROGXS +FRQG DW 0.93
RYHUDOOSXULW\ 0.92 Distillate Composition Distillate
0.91 0.0 0.2 0.4 0.6 0.8 1.0 Time ( hours ) 60 %DWFK'LVWLOODWLRQ3URILW0D[LPL]DWLRQ
Max {Net Sales(D, S0)/(tf +Tset) – TAC(N, V)}
N = 20 trays, Tsetup = 1 hour xd = 0.98, xfeed = 0.50, α = 2 Cprod/Cfeed = 4.1
V = 120 moles/hr, S0 = 100 moles.
30
20
N = 20 N = 33.7 10 Reflux Ratio
0 0 1 2 3 4
Time ( hours ) 61 %DWFK'LVWLOODWLRQ 2SWLPL]DWLRQ&DVH6WXG\
dx1,N +1 V R(t) = []y1,N - x1,N +1 D(t), x d dt H N+1
dx1,p V R = []y - y , + x - x , p = 1,...,N 1, p-1 1 p R+1 1,p+1 1,p dt Hp
dx V 1,0 = []x - y + R x - x dt S 1,0 1,0 R+1 1,1 1,0 dD V = V(t) dt R +1
x b N+1 N+1 0 0 0 S xi,0 = S -H∑ p xi,0 +∑ Hp xi,p p=1 p=1 C C ∑ xi,p = 1.0 ∑ yi,p =1.0 1 1
Ideal Equilibrium Equations yi,p = Ki,p xi,p Binary Column (55/45, Cyclohexane, Toluene) S0 = 200, V = 120, Hp = 1, N = 10, ~8000 variables, < 2 CPU hrs. (Vaxstation 3200)
62 2SWLPL]DWLRQ&DVH6WXG\
0RGHOLQJ$VVXPSWLRQV
,GHDO7KHUPRG\QDPLFV &RQVWDQW7UD\+ROGXS 1R9DSRU+ROGXS %LQDU\0L[WXUH WROXHQHF\FORKH[DQH KRXURSHUDWLRQ 7RWDO5HIOX[,QLWLDO&RQGLWLRQ
&DVHV&RQVLGHUHG
&RQVWDQW&RPSRVLWLRQRYHU7LPH 6SHFLILHG&RPSRVLWLRQDW)LQDO7LPH %HVW&RQVWDQW5HIOX[3ROLF\ 3LHFHZLVH&RQVWDQW5HIOX[3ROLF\ 63 5HIOX[2SWLPL]DWLRQ&DVHV
6 0.006
5 0.005 &RQVWDQW3XULW\RYHU7LPH 4 0.004 -x)
3 0.003
2 0.002 [ W Reflux Reflux Policy Reflux Purity I 1 0.001 Top Plate Purity, (1 ' W
0 0.000 0.0 0.2 0.4 0.6 0.8 1.0 Time (hrs.)
8 0.010
Reflux
Purity 0.008 2YHUDOO'LVWLOODWH3XULW\ 6 ∫ [G W 9 5 GW ' WI 0.006 4 I ' W 0.004 Reflux Reflux Policy 2 0.002 Distillate Purity, (1-x) 6KRUWFXW&RPSDULVRQ 0 0.000 I 0.0 0.2 0.4 0.6 0.8 1.0 ' W Time (hrs.) 64 5HIOX[2SWLPL]DWLRQ&DVHV
0.30 0.004
Reflux
0.28 Purity 0.003 &RQVWDQW5HIOX[RYHU7LPH 0.26
0.002 0.24 [G W 9 5 GW ' WI Reflux Reflux Policy 0.001 I 0.22 Distillate Purity, (1-x) ' W
0.20 0.000 0.0 0.2 0.4 0.6 0.8 1.0 Time (hrs.)
0.010 6
5 0.008 Purity 3LHFHZLVH&RQVWDQW Reflux 4 5HIOX[RYHU7LPH 0.006 3
0.004 2 ∫ RefluxPolicy
G I (1-x)Purity, Distillate [ W 9 5 GW ' W 0.002 1 ' WI 0.000 0 0.0 0.2 0.4 0.6 0.8 1.0 Time (hrs.) 65 Batch Reactive Distillation – Case Study 3
Reversible reaction between acetic acid and ethanol
l CH3COOH + CH3CH2OH CH3COOCH2CH3 + H2O
t = 0, x = 0.25 for all components
R(t) t ==1 D(t), x d tff 1 maxmax∫∫ Ddt Ddt 00
s.t. DAE
x EsterEster ≥≥0. 4600 xDD 0. 4600
V(t)
x b
Wajde & Reklaitis (1995)66 2SWLPL]DWLRQ&DVH6WXG\
0RGHOLQJ$VVXPSWLRQV
,GHDO7KHUPRG\QDPLFV &RQVWDQW7UD\+ROGXS 1R9DSRU+ROGXS 7HUWLDU\0L[WXUH (W2++2$F(7$F+2 &ROG6WDUW,QLWLDO&RQGLWLRQ
&DVHV&RQVLGHUHG
6SHFLILHG&RPSRVLWLRQDW)LQDO7LPH 2SWLPXP5HIOX[3ROLF\ 9DULRXV7UD\V&RQVLGHUHG KRXURSHUDWLRQ
67 Batch Reactive Distillation
Optimal Reflux Profiles Condenser Composition (8 trays) 100 0.50
80 0.40 60 0.30
40 0.20 M.fraction 0.10 20 Reflux Ratio 0.00 0.0 0.2 0 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0 time(hr) 1.0 time (hr) Ester Eth. Wat. A. A. 8 trays 15 trays 25 trays
Distillate Composition
YDULDEOHV 0.45 '$(V
GHJUHHVRIIUHHGRP 0.35 ILQLWHHOHPHQWV Ethyl Acetate ,3237LWHUDWLRQV 0.25 &38PLQXWHV 0.0 0.2 0.4 0.6 0.8 1.0 time (hr) 8 trays 15 trays 25 trays 68 Batch Reactive Distillation
Discretized CPU (s) Trays DAEs Iterations Variables Global Elemental
8 98 1788 14 56.4 37.2
15 168 3048 32 245.7 207.5
25 258 4678 45 1083.2 659.3
CPU Decomposition Time
21
16
11
CPU(s)/iter 6
1 90 130 170 210 250 DAEs 69 Nonlinear Model Predictive Control (NMPC)
NMPC Estimation and Control
z : differential states Why NMPC? d : disturbances y : algebraic states Process Track a profile Severe nonlinear dynamics (e.g, sign changes in gains) u : manipulated variables NMPC Controller Operate process over wide range z′ = F(z, y,u, p,d ) (e.g., startup and shutdown) 0 = G()z, y,u, p,d
ysp : set points Model Updater ′ ( ) z = F z, y,u, p,d NMPC Subproblem 0 = G()z, y,u, p,d − sp + k − k −1 min ∑|| y(t) y || y ∑|| u(t ) u(t ||) u u Q Q z′(t) = F (z(t), y(t), u(t),t) ts .. 0 = G(z(t), y(t), u(t),t) z(t) = z init Bound Constraint s Other Constraint s 70 Dynamic optimization in a MATLAB Framework
Dynamic Optimization Dynamic Process NLP Optimization Problem Discretization Problem Method No. of Time Process Model Process Model JOBKENNUNG : C:\Krawal-modular\IGCC\IGCC_Puertollano_komplett.gek Elements f (x′,x,y,u,p,t)= 0 Collocation ˆ ( )= f xˆ,yˆ,uˆ,p,t f ,x0 0 Order g(x, y,u, p,t)= 0 ( )= gˆ xˆ,yˆ,uˆ,p,t f 0 Inequality Constraints ′ ≤ Inequality Constraints bar kJ/kg kg/s °C (X) Saturator-System h(x ,x,y,u,p t), 0 Alle Drücke sind absolut P..Druck..bar T..Temperatur..°C copy ofDesign H..Enthalpie..kJ/kg Wärmeschaltplan M..Massenstrom..kg/s SIEMENS AG In Bearbeitung Nr. -F Ref T - PHI..Luft-Feuchte..% Erlangen, 13.Oct.1999 F Ref T hˆ (xˆ,yˆ,uˆ,p,t ) ≤ 0 Full f Initial Conditions Discretization Constraints at Final Time x (t ) = x 0 0 of State and ϕ (x , y ,u ,p,t )= 0 Control Nt Nt Nt f Constraints at Final Time Variables ϕ ( ′ )= Objective Function x (t f ,) x(t f ,) y(t f ,) u(t f ,) x 0 ,p,t f 0 min P(x , y ,u ,p,t ) Nt Nt Nt f Objective Function P(x(t ,) y(t ,) u(t ),p,x ,t ) min f f f 0 f u(t),p,x0 ,t f
71 Tennessee Eastman Process
Unstable Reactor 11 Controls; Product, Purge streams Model extended with energy balances 72 Tennessee Eastman Challenge Process
DAE Model NLP Optimization problem
Number of differential equations 30 Number of variables 10920 of which are fixed 0 Number of algebraic variables 152 Number of constraints 10260 Number of lower bounds 780 Number of algebraic equations 141 Number of upper bounds 540 Difference (control variables) 11 Number of nonzeros in Jacobian 49230 Number of nonzeros in Hessian 14700
Method of Full Discretization of State and Control Variables Large-scale Sparse block-diagonal NLP
73 Setpoint change studies
Process variable Type Magnitude
-15% Make a step change to the variable(s) used to set Production rate change Step the process production rate so that the product flow leaving the stripper column base changes from 14,228 to 12,094 kg h-1 -60 kPa Reactor operating pressure Step Make a step change so that the reactor operating change pressure changes from 2805 to 2745 kPa +2% Purge gas composition of Make a step change so that the composition of Step component B change component B in the gas purge changes from 13.82 to 15.82%
Setpoint changes for the base case [Downs & Vogel]
74 Case Study: Change Reactor pressure by 60 kPa
Control profiles
All profiles return to their base case values Same production rate Same product quality Same control profile Lower pressure – leads to larger gas phase (reactor) volume Less compressor load
75 TE Case Study – Results I
Shift in TE process
Same production rate More volume for reaction Same reactor temperature Initially less cooling water flow (more evaporation)
76 Case Study- Results II
Shift in TE process
Shift in reactor effluent to more condensables Increase cooling water flow Increase stripper steam to ensure same purity Less compressor work
77 Case Study: Change Reactor Pressure by 60 kPa
Optimization with IPOPT 1000 Optimization Cycles 5-7 CPU seconds 11-14 Iterations Optimization with SNOPT Often failed due to poor conditioning Could not be solved within sampling times > 100 Iterations
78 OptimizationOptimization asas aa FrameworkFramework forfor IntegrationIntegration
+ Directly handles interactions, multiple conditions + Trade-offs unambiguous, quantitative
- Larger problems to solve - Need to consider a diverse process models
Research Questions
How should diverse models be integrated? Is further algorithmic development needed?
79 Batch Integration Case Study
Plant Design
Dynamic Processing
R T
Time Time
Production Planning Stage 1 A B C
Stage 2 A B C
:KDWDUHWKH,QWHUDFWLRQVEHWZHHQ'HVLJQ DQG'\QDPLFVDQG3ODQQLQJ" :KDWDUHWKHGLIIHUHQFHVEHWZHHQ6HTXHQWLDODQG 6LPXOWDQHRXV6WUDWHJLHV" (VSHFLDOO\,PSRUWDQWLQ%DWFK6\VWHPV 80 6LPXOWDQHRXV'\QDPLF2SWLPL]DWLRQ
Dynamic Processing
Shorter Same conversion T in reduced time
Conv. processing times
Time Time
Fewer Higher conversion T in same time
Conv. product batches
Time Time Best transient Best constant
Production Planning Stage 1 A B C Shorter Planning Horizon Stage 2 A B C
GLVFUHWL]H '$(V VWDWHDQGFRQWUROSURILOHV ODUJHVFDOHRSWLPL]DWLRQSUREOHP KDQGOHVSURILOHFRQVWUDLQWVGLUHFWO\ LQFRUSRUDWHVHTXLSPHQWYDULDEOHVGLUHFWO\ '$(PRGHOVROYHGRQO\RQFH FRQYHUJHVIRUXQVWDEOHV\VWHPV 81 Scheduling Formulation
VHTXHQFLQJRIWDVNVSURGXFWVHTXLSPHQW H[SHQVLYHGLVFUHWHFRPELQDWRULDORSWLPL]DWLRQ FRQVLGHULGHDOWUDQVIHUSROLFLHV 8,6DQG=: FORVHGIRUPUHODWLRQV %LUHZDU DQG*URVVPDQQ
AB N stage I 8QOLPLWHG,QW6WRUDJH 8,6 stage 2 6KRUWSURGXFWLRQF\FOH &\FOHWLPHLQGHSHQGHQWRIVHTXHQFH
=HUR:DLW =: ,PPHGLDWHWUDQVIHUUHTXLUHG
A B N 6ODFNWLPHVGHSHQGHQWRQSDLU stage I /RQJHUSURGXFWLRQF\FOHUHTXLUHG stage 2
82 CaseCase StudyStudy ExampleExample
4 stages, 3 products of different purity Dynamic reactor - temperature profile Dynamic column - reflux profile A + B → C C + B → P + E P + C → G z 0 z 0 0 z 0 B i,I i,II zi ,III i,IV i z f f f f i ,IV zi,I zi ,II zi ,III
Process Optimization Cases
SQ - Sequential Design - Scheduling - Dynamics SM - Simultaneous Design and Scheduling Dynamics with endpoints fixed . SM* - Simultaneous Design, Scheduling and Dynamics
83 6FHQDULRVLQ&DVH6WXG\
&RPSDULVRQRI'\QDPLFYV%HVW&RQVWDQW3URILOHV
5 EHVWFRQVWDQWWHPSHUDWXUHSURILOH 5 RSWLPDOWHPSHUDWXUHSROLF\
& EHVWFRQVWDQWUHIOX[UDWLR & RSWLPDOUHIOX[UDWLR
640 25 C o nsta nt C o nsta nt 630 Dyn a m ic Dyn a m ic 20 620
610 15
600 10 590
580 5 0 0.25 0.5 0.75 1 1.25 0 0.5 1 1.5 2 2.5 Tim e (h r.) 84 Results for Simultaneous Cases
2 0 0 0 0 R0/R1
1 5 0 0 0 best constant / optimal temperature
it it ($) 1 0 0 0 0
Pro f C0/C1
5 0 0 0 best constant / optimal reflux ratio
0 R0C0 R1C0 R0C1 R1C1
C a s e s
Sim ulta neo us SIm ulta ne ous S e q u e n t ia l w ith FIXED sta te s w ith FREE sta te s
I 64 II III IV I II 60 III IV I II 60 III IV
- ZW schedule becomes tighter
- less dependent on product sequences 85 Summary Sequential Approaches - Parameter Optimization • Gradients by: Direct and Adjoint Sensitivity Equations - Optimal Control (Profile Optimization) • Variational Methods • NLP-Based Methods - Require Repeated Solution of Model - State Constraints are Difficult to Handle
Simultaneous Approach - Discretize ODE's using orthogonal collocation on finite elements (solve larger optimization problem) - Accurate approximation of states, location of control discontinuities through element placement. - Straightforward addition of state constraints. - Deals with unstable systems Simultaneous Strategies are Effective - Directly enforce constraints - Solve model only once - Avoid difficulties at intermediate points
Large-Scale Extensions - Exploit structure of DAE discretization through decomposition
- Large problems solved efficiently with IPOPT 86 DAE Optimization Resources References Bryson, A.E. and Y.C. Ho, Applied Optimal Control, Ginn/Blaisdell, (1968). Himmelblau, D.M., T.F. Edgar and L. Lasdon, Optimization of Chemical Processes, McGraw-Hill, (2001). Ray. W.H., Advanced Process Control, McGraw-Hill, (1981).
Software • Dynamic Optimization Codes ACM – Aspen Custom Modeler DynoPC - simultaneous optimization code (CMU) COOPT - sequential optimization code (Petzold) gOPT - sequential code integrated into gProms (PSE) MUSCOD - multiple shooting optimization (Bock) NOVA - SQP and collocation code (DOT Products) • Sensitivity Codes for DAEs DASOLV - staggered direct method (PSE) DASPK 3.0 - various methods (Petzold) SDASAC - staggered direct method (sparse) DDASAC - staggered direct method (dense)
87 DynoPC – Windows Implementation
Model in FORTRAN Data/Config. Outputs/Graphic & Texts
Interface of DynoPC F2c/C++
ADOL_C Redued SQP Optimal Solution Preprocessor
F/DF Calc. of independent Interior Point variable move For QP DDASSL F/DF Decomposition COLDAE Line Search
Simulator/Discretizer Starting Point Linearization FILTER
88 Example: Batch Reactor Temperature
1 2 ABC
Max b(t f ) .ts . T da E = −k exp( −1 ) ⋅ a at 1 RT db E E =k exp( −1 ) ⋅a − k exp( −2 ) ⋅ b at 1 RT 2 RT
a + b + c = 1
89 Example: Car Problem
Min tf s.t. z1’ = z2 z2’ = u
z2 zmax -2 u 1
2 12
10 1
8 0 subroutine model(nz,ny,nu,np,t,z,dmz,y,u,p,f) double precision t, z(nz),dmz(nz), y(ny),u(nu),p(np) Velocity 6
Acceleration -1 Velocity double precision f(nz+ny) 4
Acceleration, u(t) Acceleration, f(1) = p(1)*z(2) - dmz(1) -2 2 f(2) = p(1)*u(1) - dmz(2)
-3 0 return 0 10 20 30 40 end Time
90 Example: Crystallizer Temperature
4.5E-03 60 4.0E-03 SUBROUTINE model(nz,ny,nu,np,x,z,dmz,y,u,p,f) 50 3.5E-03 implicit double precision (a-h,o-z) double precision f(nz+ny),z(nz),dmz(nz),Y(ny),yp(4),u(1) 3.0E-03 40 double precision kgr, ln0, ls0, kc, ke, kex, lau, deltT, alpha 2.5E-03 dimension a(0:3), b(0:3) 30 2.0E-03 data alpha/1.d-4/,a/-66.4309d0, 2.8604d0, -.022579d0, 6.7117d-5/, + b/16.08852d0, -2.708263d0, .0670694d0, -3.5685d-4/, kgr/ 4.18d-3/, 1.5E-03 20 + en / 1.1d0/, ln0/ 5.d-5/, Bn / 3.85d2/, em / 5.72/, ws0/ 2.d0/, T jacket (C) Crystal size 1.0E-03 + Ls0/ 5.d-4 /, Kc / 35.d0 /, Kex/ 65.d0/, are/ 5.8d0 /, 10 + amt/ 60.d0 /, V0 / 1500.d0/, cw0/ 80.d0/,cw1/ 45.d0/,v1 /200.d0/, 5.0E-04 + tm1/ 55.d0/,x6r/0.d0/, tem/ 0.15d0/,clau/ 1580.d0/,lau/1.35d0/, 0.0E+00 0 + cp/ 0.4d0 /,cbata/ 1.2d0/, calfa/ .2d0 /, cwt/ 10.d0/ 0 5 10 15 20 ke = kex*area Time (hr) x7i = cw0*lau/(100.d0-cw0) v = (1.d0 - cw0/100.d0)*v0 w = lau*v0 Crystal Size T jacket yp(1) = (deltT + dsqrt(deltT**2 + alpha**2))*0.5d0 yp(2) = (a(0) + a(1)*yp(4) + a(2)*yp(4)**2 + a(3)*yp(4)**3) TT TC yp(3) = (b(0) + b(1)*yp(4) + b(2)*yp(4)**2 + b(3)*yp(4)**3) deltT = yp(2) - z(8) yp(4) = 100.d0*z(7)/(lau+z(7)) Maximize crystal size f(1) = Kgr*z(1)**0.5*yp(1)**en - dmz(1) at final time f(2) = Bn*yp(1)**em*1.d-6 - dmz(2) f(3) = ((z(2)*dmz(1) + dmz(2) * Ln0)*1.d+6*1.d-4) - dmz(3) f(4) = (2.d0*cbata*z(3)*1.d+4*dmz(1)+dmz(2)*Ln0**2*1.d+6)-dmz(4) Steam f(5) = (3.d0*calfa*z(4)*dmz(1)+dmz(2)*Ln0**3*1.d+6) - dmz(5) f(6) = (3.d0*Ws0/(Ls0**3)*z(1)**2*dmz(1)+clau*V*dmz(5))-dmz(6) f(7) = -dmz(6)/V - dmz(7) Cooling f(8) = (Kc*dmz(6) - Ke*(z(8) - u(1)))/(w*cp) - dmz(8) water f(9) = y(1)+YP(3)- u(1) return end Control variable = Tjacket = f(t)?
91