Struct Multidisc Optim (2012) 46:93–109 DOI 10.1007/s00158-011-0749-1

RESEARCH PAPER

Constrained multifidelity optimization using model calibration

Andrew March · Karen Willcox

Received: 4 April 2011 / Revised: 22 November 2011 / Accepted: 28 November 2011 / Published online: 8 January 2012 c Springer-Verlag 2012

Abstract Multifidelity optimization approaches seek to derivative-free and sequential meth- bring higher-fidelity analyses earlier into the design process ods. The method uses approximately the same number of by using performance estimates from lower-fidelity models high-fidelity analyses as a multifidelity trust-region algo- to accelerate convergence towards the optimum of a high- rithm that estimates the high-fidelity gradient using finite fidelity design problem. Current multifidelity optimization differences. methods generally fall into two broad categories: provably convergent methods that use either the high-fidelity gradient Keywords Multifidelity · Derivative-free · Optimization · or a high-fidelity pattern-search, and heuristic model cali- Multidisciplinary · Aerodynamic design bration approaches, such as interpolating high-fidelity data or adding a Kriging error model to a lower-fidelity function. This paper presents a multifidelity optimization method that bridges these two ideas; our method iteratively calibrates Nomenclature lower-fidelity information to the high-fidelity function in order to find an optimum of the high-fidelity design prob- A Active and violated constraint Jacobian lem. The algorithm developed minimizes a high-fidelity a Sufficient decrease parameter objective function subject to a high-fidelity constraint and B A closed and bounded set in Rn other simple constraints. The algorithm never computes the B Trust region gradient of a high-fidelity function; however, it achieves C Differentiability Class first-order optimality using sensitivity information from the c(x) Inequality constraint calibrated low-fidelity models, which are constructed to d Artificial lower bound for constraint value have negligible error in a neighborhood around the solution. e(x) Error model for objective The method is demonstrated for aerodynamic shape opti- e¯(x) Error model for constraint mization and shows at least an 80% reduction in the num- f (x) Objective function ber of high-fidelity analyses compared other single-fidelity g(x) Inequality constraint h(x) Equality constraint L Expanded level-set in Rn L Level-set in Rn A version of this paper was presented as paper AIAA-2010-9198 L Lagrangian of trust-region subproblem at the 13th AIAA/ISSMO Multidisciplinary Analysis and M Space of all fully linear models Optimization Conference, Fort Worth, TX, 13–15 September 2010. m(x) Surrogate model of the high-fidelity objective B · A. March ( ) K. Willcox function Department of Aeronautics & Astronautics, Massachusetts Institute ¯ ( ) of Technology, Cambridge, MA 02139, USA m x Surrogate model of the high-fidelity constraint e-mail: [email protected] n Number of design variables K. Willcox p Arbitrary step vector e-mail: [email protected] RBF Radial Basis Function 94 A. March, K. Willcox r Radial distance between two points 1 Introduction s Trust-region step t Constant between 0 and 1 High-fidelity numerical simulation tools can provide valu- w Constraint violation conservatism factor able insight to designers of complex systems and support x Design vector better decisions through optimal design. However, the de- α Convergence tolerance multiplier sign budget frequently prevents the use of formal optimiza- β Convergence tolerance multiplier tion methods in conjunction with high-fidelity simulation γ0 Trust-region contraction ratio tools, due to the high cost of simulations. An additional γ1 Trust-region expansion ratio challenge for optimization is obtaining accurate gradient ε Termination tolerance information for high-fidelity models, especially when the ε2 Termination tolerance for trust region size design process employs black-box and/or legacy tools. In-  Trust region size stead, designers often set design aspects using crude lower- δh Linearized step size for feasibility fidelity performance estimates and later attempt to optimize δx Finite difference step size system subcomponents using higher-fidelity methods. This η0 Trust-region contraction criterion approach in many cases leads to suboptimal design per- η1,η2 Trust-region expansion criterion formance and unnecessary expenses for system operators. κ Bound related to function smoothness An alternative is to use the approximate performance esti- λ Lagrange multiplier mates of lower-fidelity simulations in a formal multifidelity λˆ Lagrange multiplier for surrogate model optimization framework, which aims to speed the design ξ Radial basis function correlation length process towards a high-fidelity optimal design using sig- ρ Ratio of actual to predicted improvement nificantly fewer costly simulations. This paper presents a σ Penalty parameter multifidelity optimization method for the design of com- τ Trust-region solution tolerance plex systems, and targets in particular, systems for which  Penalty function accurate design sensitivity information is challenging to ˆ Surrogate penalty function obtain. φ Radial basis function Multifidelity optimization of systems with design con- straints can be achieved with the approximation model man- agement framework of Alexandrov et al. (1999, 2001). That Superscript technique optimizes a sequence of surrogate models of the objective function and constraints that are first-order con- ∗ Optimal sistent with their high-fidelity counterparts. The first-order + Active or violated inequality constraint consistent surrogates are constructed by correcting lower- fidelity models with either additive or multiplicative error models based on function value and gradient information of Subscript the high-fidelity models. This technique is provably conver- gent to a locally optimal high-fidelity design and in practice 0 Initial iterate can provide greater than 50% reductions in the number bhm Bound for Hessian of the surrogate model of high-fidelity function calls, translating into important blg Upper bound on the Lipschitz constant reductions in design turnaround times (Alexandrov et al. c Relating to constraint 1999). Estimating the needed gradient information is often cg Relating to constraint gradient a significant challenge, especially in the case of black-box cgm Maximum Lipschitz constant for a constraint simulation tools. For example, the high-fidelity numerical gradient simulation tool may occasionally fail to provide a solution cm Maximum Lipschitz constant for a constraint or it may contain noise in the output (e.g., due to numer- f Relating to objective function ical tolerances). In these cases, which arise frequently as fg Lipschitz constant for objective function gradient both objectives and constraints in complex system design, FCD Fraction of Cauchy Decrease if design sensitivities are unavailable then finite-difference g Relating to objective function gradient gradient estimates will likely be inaccurate and non-robust. high Relating to the high-fidelity function Therefore, there is a need for constrained multifidelity k Index of trust-region iteration optimization methods that can guarantee convergence to low Relating to a lower-fidelity function a high-fidelity optimal design while reducing the number m¯ Related to constraint surrogate model of expensive simulations, but that do not require gradient max User-set maximum value of that parameter estimates of the high-fidelity models. Constrained multifidelity optimization using model calibration 95

Constraint handling in derivative-free optimization is between the high-fidelity function and the surrogate model. challenging. Common single-fidelity approaches use lin- This ensures that the error between the high-fidelity model ear approximations (Powell 1994, 1998), an augmented gradient and surrogate model gradient is locally bounded Lagrangian (Kolda et al. 2003, 2006; Lewis and Torczon without ever calculating the high-fidelity gradient. Conn 2010), exact (Liuzzi and Lucidi 2009), or et al. (2008) showed that polynomial interpolation mod- constraint filtering (Audet and Dennis 2004) in combina- els can be made fully linear, provided the interpolating set tion with either a pattern-search or a simplex method. In the satisfies certain geometric requirements, and further devel- multifidelity setting, if the generated surrogate models are oped an unconstrained gradient-free optimization technique smooth, then surrogate sensitivity information can be used using fully linear models (Conn et al. 2009a, b). Wild to speed convergence.1 This may provide an advantage over et al. (2008) demonstrated that a radial basis function inter- single-fidelity pattern-search and simplex methods, espe- polation could satisfy the requirements for a fully linear cially when the dimension of the design space is large. For model and be used in Conn’s derivative-free optimization example, constraint-handling approaches used in conjunc- framework (Wild 2009; Wild and Shoemaker 2009). March tion with Efficient Global Optimization (EGO) (Jones et al. and Willcox (2010) generalized this method to the cases 1998) either estimate the probability that a point is both a with arbitrary low-fidelity functions or multiple low-fidelity minimum and that it is feasible, or add a smooth penalty to functions using Bayesian model calibration methods. the surrogate model prior to using optimization to select new Section 2 of this paper presents the derivative-free high-fidelity sample locations (Sasena et al. 2002; Jones method to optimize a high-fidelity objective function sub- 2001; Rajnarayan et al. 2008). These heuristic approaches ject to constraints with available derivatives. Fully linear may work well in practice, but unfortunately have no guar- surrogate models of the objective function are minimized antee of convergence to a minimum of the high-fidelity within a trust-region setting until no further progress is pos- design problem. Another constraint-handling method used sible or when convergence to a high-fidelity optimum is together with the Surrogate Management Framework (SMF) achieved. Section 3 presents a technique for minimizing a (Booker et al. 1999) augments a pattern-search method high-fidelity objective function subject to both constraints with predictions of locally optimal designs from a surro- with available derivatives and computationally expensive gate model. The underlying pattern-search method ensures constraints with unavailable derivatives. The constraints convergence of the method, so a broad range of surrogate without available derivatives are approximated with mul- models are allowable. One technique for generating the tifidelity methods, whereas the other constraints are han- surrogate model is conformal space mapping, where a low- dled either implicitly with a penalty method or explicitly. fidelity function is calibrated to the high-fidelity function Section 4 presents an aerodynamic shape optimization prob- at locations where the value is known (Castro et al. 2005). lem to demonstrate the proposed multifidelity optimization This calibrated surrogate enables the inclusion of accurate techniques and compares the results with other single- local penalty methods as a sensitivity-based technique for fidelity methods and approximation model management quickly estimating the location of constrained high-fidelity using finite difference gradient estimates. Finally, Section 5 optima. concludes the paper and discusses extensions of the method These existing multifidelity methods highlight the ben- to the case when constraints are hard (when the objective efits of exploiting sensitivity information generated from function fails to exist if the constraints are violated). calibrated surrogate models in a derivative-free optimiza- tion setting. Here we propose a formal framework, based on a derivative-free trust region approach, to systemat- 2 Constrained optimization of a multifidelity ically generate calibration data and to control surrogate objective function model quality. Thus, this paper develops a constrained mul- tifidelity optimization method that in practice can quickly This section considers the constrained optimization of a find high-fidelity optimal designs and be robust to unavail- high-fidelity function, fhigh(x), that accurately estimates able or inaccurate gradient estimates, because it gener- system metrics of interest but for which accurate gradi- ates smooth surrogate models of high-fidelity objectives ent estimates are unavailable. We first present a formalized and constraints without requiring high-fidelity model gra- problem statement and some qualifying assumptions. We dient information. More specifically, this is accomplished then present a trust-region framework, the surrogate-based by creating a fully linear model (formally defined in the optimization problems performed within the trust region, next section), which establishes Lipschitz-type error bounds and the trust region updating scheme. We follow this with an algorithmic implementation of the method, and with a 1Note that in the multifidelity setting, we use the term “derivative-free” brief discussion of algorithmic limitations and theoretical to indicate an absence of derivatives of the high-fidelity model. considerations needed for robustness. 96 A. March, K. Willcox

2.1 Problem setup and assumptions properties. First, the merit function with the initial penalty, σ0, must be bounded from below within a relaxed level-set, n We seek the vector x ∈ R of n design variables that L (x0,σ0), defined as minimizes the value of the high-fidelity objective function n subject to equality constraints, h(x), and inequality con- L(x0,σ0) = x ∈ R : (x,σ0) ≤ (x0,σ0) (4) straints g(x), n B(xk) = x ∈ R :x − xk≤max (5) L ( ,σ ) = ( ,σ ) ( ), min fhigh(x) x0 0 L x0 0 B xk (6) x∈Rn xk∈L(x0,σ0) s.t. h(x) = 0 (1) where  is the maximum allowable trust-region size ( ) ≤ , max g x 0 and the relaxed level-set is required because the trust-region algorithm may attempt to evaluate the high-fidelity function where we assume gradients of h(x) and g(x) with respect at points outside of the level set at x0. Second, the level to x are available or can be estimated accurately. To reduce sets of (xk,σk >σ0) must be contained within L(x0,σ0), the number of evaluations of fhigh(x) we use a low-fidelity and third, L(x0,σ0) must be a compact set. These properties function, flow(x), that estimates the same metric as fhigh(x) ensure that all design iterates, xk, remain within L(x0,σ0). but with cheaper evaluation cost and lower accuracy. We Although other merit functions, such as augmented seek to find the solution to (1) without estimating gra- Lagrangians, are possible, we restrict our attention to merit f (x) f (x) f (x) dients of high , by calibrating low to high and functions based on quadratic penalty functions because it using sensitivity information from the calibrated surrogate is trivial to show that they are bounded from below if the model. The calibration strategy employed may break down objective function obtains a finite global minimum and there f (x) f (x) should either high or low not be twice continu- is no need to consider arbitrarily bad Lagrange multiplier ously differentiable or not have a Lipschitz continuous first estimates. The merit function used in this method is the derivative, although in many such cases the algorithm may objective function plus the scaled sum-squares of the con- still perform well. straint violation, where g+(x) denotes the values of the nonnegative inequality constraints, 2.2 Trust-region model management σ σ ( ,σ ) = ( ) + k ( )T ( ) + k +( )T +( ). x k fhigh x 2 h x h x 2 g x g x From an initial design vector x0, the trust-region method (7) generates a sequence of design vectors that each reduce a merit function consisting of the high-fidelity function value The parameter σk is a penalty weight, which must go to and penalized constraint violation, where we denote xk to be +∞ as the iteration k goes to +∞. Note that when using this design vector on the kth trust-region iteration. Follow- a quadratic penalty function for constrained optimization, ing the general Bayesian calibration approach in Kennedy under suitable hypotheses on the optimization algorithm and and O’Hagan (2000), we define ek(x) to be a model of the penalty function, the sequence of iterates generated, {xk}, error between the high- and low-fidelity functions on the can either terminate at a feasible regular point at which the kth trust-region iteration, and construct a surrogate model Karush–Kuhn–Tucker (KKT) conditions are satisfied, or at mk(x) for fhigh(x) as a point that minimizes the squared norm of the constraint violation, h(x)Th(x) + g+(x)Tg+(x) (Nocedal and Wright mk(x) = flow(x) + ek(x). (2) 2006; Bertsekas 1999). ˆ We now define a surrogate merit function, (x,σk), B We define the trust region at iteration k, k, to be the region which replaces fhigh(x) with its surrogate model mk(x), centered at xk with size k, ˆ σk T σk + T + (x,σk) = mk(x) + h(x) h(x) + g (x) g (x). B ={ : − ≤ }, 2 2 k x x xk k (3) (8) where any norm on Rn can be used. Optimization is performed on this function, and updates to To solve the constrained optimization problem presented the trust-region are based on how changes in this surrogate in (1) we define a merit function, (xk,σk),whereσk is a merit function compare with changes in the original merit parameter that must go to infinity as the iteration number k function, (x,σk). goes to infinity and serves to increase the penalty placed on Our calibration strategy is to make the surrogate models the constraint violation. To prevent divergence of this algo- mk(x) fully linear, where the following definition of a fully rithm, we need the penalty function to satisfy some basic linear model is from Conn et al.: Constrained multifidelity optimization using model calibration 97

n Definition 1 Let a function fhigh(x) : R → R that is suggests (11) has a feasible solution then we take recourse continuously differentiable and has a Lipschitz continuous to (12).2 derivative, be given. A set of model functions M ={m : For both trust-region subproblems, (11)and(12), the Rn → R, m ∈ C1} is called a fully linear class of models if subproblem must be solved such that the 2-norm of the the following occur: first-order optimality conditions is less than a constant τk. There exist positive constants κ f ,κg and κblg such that for This requirement is stated as ∇x Lk≤τk,whereLk is any x ∈ L(x0,σ0) and k ∈ (0,max] there exists a model the Lagrangian for the trust-region subproblem used. There function mk(x) in M with Lipschitz continuous gradient are two requirements for τk.Firstτk <ε,whereε is the and corresponding Lipschitz constant bounded by kblg,and desired termination tolerance for the optimization problem such that the error between the gradient of the model and in (1). Second, τk must decrease to zero as the number of the gradient of the function satisfies iterations goes to infinity. Accordingly, we define

τ = βε,α , ∇ fhigh(x) −∇mk(x)≤κgk ∀x ∈ Bk (9) k min [ k] (13) and the error between the model and the function satisfies with a constant β ∈ (0, 1) to satisfy the overall tolerance α ∈ ( , )  criteria, and a constant a 1 multiplying k to ensure ( ) − ( ) ≤ κ 2 ∀ ∈ B . that τ goes to zero. The constant a will be defined as part fhigh x mk x f k x k (10) k of a sufficient decrease condition that forces the size of the

Such a model mk(x) is called fully linear on Bk (Conn et al. trust-region to decrease to zero in the next subsection. 2009a). 2.4 Trust-region updating 2.3 Trust-region subproblem Without using the high-fidelity function gradient, the trust- At each trust-region iteration a point likely to decrease the region update scheme must ensure the size of the trust- merit function is found by solving one of two minimization region decreases to zero to establish convergence. To problems on the fully linear model for a step sk,onatrust do this, a requirement similar to the fraction of Cauchy region of size k: decrease requirement in an the unconstrained trust-region formulation is used (see, for example, Conn et al. 2009a).

min mk(xk + sk) We require that the improvement in our merit function is at s ∈Rn k least a small constant a ∈ (0,ε], multiplying k,

s.t. h(xk + sk) = 0 ˆ ˆ (xk,σk) − (xk + sk,σk) ≥ ak. (14) ( + ) ≤ g xk sk 0 The sufficient decrease condition is enforced through the trust region update parameter, ρk. The update parameter sk≤k, (11) is the ratio of the actual reduction in the merit function to the predicted reduction in the merit function unless the or sufficient decrease condition is not met, ˆ min k(xk + sk,σk) (ˆ ,σ ) − (ˆ + ,σ )<  s ∈Rn 0 xk k xk sk k a k k ρ = ( ,σ )−( + ,σ ) k xk k xk sk k . ˆ ˆ otherwise (xk ,σk )−(xk +sk ,σk ) s.t. sk≤k. (12) (15)

The subproblem in (12) is used initially to reduce constraint The size of the trust region, k, must now be updated based infeasibility. However, there is a limitation with this sub- on the quality of the surrogate model prediction. The size of problem that the norm of the objective function Hessian the trust region is increased if the surrogate model predicts grows without bound due to the penalty parameter increas- the change in the function value well, kept constant if the ing to infinity. Therefore to both speed convergence and 2 prevent Hessian conditioning issues, the subproblem in (11) Note that an initial feasible point, x0 could be found directly using the with explicit constraint handling is used as soon as a point gradients of h(x) and g(x); however, since the penalty method includes that satisfies h(x) = 0andg(x) ≤ 0 exists within the cur- the descent direction of the objective function it may better guide the optimization process in the case of multiple feasible regions. Should rent trust region. This is estimated by a linear approximation the initial iterate be feasible, the deficiencies of a quadratic penalty to the constraints, however, if the linearized estimate falsely function are not an issue. 98 A. March, K. Willcox prediction is fair, and the trust region is contracted if the surrogate model instead of the high-fidelity function. This model predicts the change poorly. Specifically, we update approximate stationarity condition is similar to what would the trust region size using be obtained using a finite-difference gradient estimate with ⎧ a fixed step size where the truncation error in the approxi- ⎨⎪min{γ1k,max} if η1 ≤ ρk ≤ η2, mate derivative eventually dominates the stationarity mea-  = γ  ρ ≤ η , sure, but in our case the sufficient decrease modification k+1 ⎪ 0 k if k 0 (16) ⎩ of the update parameter eventually dominates the station- k otherwise, arity measure (Boggs and Dennis 1976). For k → 0, we have from (9)that∇ f (x) −∇m (x)→0, and where 0 <η <η < 1 <η,0<γ < 1, and γ > 1. high k 0 1 2 0 1 λ(ˆ ) − λ( )→ Regardless of whether or not a sufficient decrease has been also xk xk 0. Therefore, we have first-order found, the trust-region center will be updated if the trial optimality as given in (18). In practice, the algorithm is point has decreased the value of the merit function, terminated when the constraint violation is small, (19), first- order optimality is satisfied on the model, (21), and the trust region is small, say k <ε2 for a small ε2. xk + sk if (xk,σk)>(xk + sk,σk) xk+1 = (17) xk otherwise. 2.6 Implementation A new surrogate model, mk+1(x), is then built such that B it is fully linear on a region k+1 having center xk+1 and The numerical implementation of the multifidelity opti-  size k+1. The new fully linear model is constructed using mization algorithm, which does not compute the gradi- the procedure of Wild et al. (2008) with the calibration ent of the high-fidelity objective function, is presented as technique of March and Willcox (2010). Algorithm 1. A set of possible parameters that may be used in this algorithm is listed in Table 2 in Section 4.Akey 2.5 Termination element of this algorithm is the logic to switch from the penalty function trust-region subproblem, (12), to the sub- For termination, we must establish that the first-order KKT problem that uses the constraints explicitly, (11). Handling conditions, the constraints explicitly will generally lead to faster conver- gence and fewer function evaluations; however, a feasible ∇ ( ) + ( )Tλ( ) ≤ ε, fhigh xk A xk xk (18) solution to this subproblem likely does not exist at early iterations. If either the constraint violation is sufficiently T + T T + T small,  h(xk) , g (xk) ≤ε, or the linearized steps, h(xk) , g (xk) ≤ ε (19) T δh, satisfying h(x) +∇h(x) δh = 0 for all equality and inequality constraints are all smaller than the size of the trust are satisfied at xk,whereA(xk) is defined to be the Jacobian region, then the subproblem with the explicit constraints of all active or violated constraints at xk, is attempted. If the optimization fails, then the penalty T function subproblem is solved. A(x ) = ∇h(x )T, ∇g+(x )T , (20) k k k This method may be accelerated with the use of mul- tiple lower-fidelity models. March and Willcox (2010) and λ(x ) are Lagrange multipliers. The additional com- k suggest a multifidelity filtering technique to combine esti- plementarity conditions are assumed to be satisfied from mates from multiple low-fidelity functions into a single the solution of (11). The constraint violation criteria, (19), maximum likelihood estimate of the high-fidelity function can be evaluated directly. However, the first-order condi- value. That technique will work unmodified within this mul- tion, (18), cannot be verified directly in the derivative- tifidelity optimization framework and in many situations free case because the gradient, ∇ f (x ), is unknown. high k may improve performance. Therefore, for first-order optimality we require two condi- tions: first-order optimality with the surrogate model, and a sufficiently small trust-region. The first-order optimality 2.7 Theoretical considerations condition using the surrogate model is This subsection discusses some performance limitations of ∇ ( ) + ( )Tλ(ˆ ) ≤ βε,α , ≤ ε, mk xk A xk xk max [ k a] (21) the proposed algorithm as well as theoretical considerations needed for robustness. where λˆ are the Lagrange multipliers computed using the This paper has not presented a formal convergence the- surrogate model and active constraint set estimated from the ory; at best, such a theory will apply only under many Constrained multifidelity optimization using model calibration 99

restrictive assumptions. In addition, we can at best guaran- projection of the gradient onto the feasible domain is less tee convergence to a near-optimal solution (i.e., optimality than the parameter a, then the algorithm can fail to make to an a priori tolerance level and not to stationarity). Forc- progress. This limitation is presented in (21); however, it ing the trust region size to decrease every time the sufficient is not seen as severe because a can be set at any value decrease condition is not satisfied means that if the arbitrarily close to (but strictly greater than) zero. 100 A. March, K. Willcox

∗ A second limitation is that the model calibration strat- where x is a KKT point of (1), and κ1, κ2 are finite pos- egy, generating a fully linear model, is theoretically only itive constants. If {σkk} diverges then an iteration exists guaranteed to be possible for functions that are twice con- where both the bound, (22), holds (the trust region size must tinuously differentiable and have Lipschitz-continuous first be large enough that a feasible point exists in the interior) derivatives. Though this assumption may seem to limit and the constraint violation is less than the given toler- some desired opportunities for derivative-free optimization ance ε. This enables the switching between the two trust of functions with noise, noise with certain characteristics region subproblems, (12)and(11). The upper bound for the σ σ 2 = like that discussed in Conn et al. (2009b, Section 9.3), or growth of k, that (ii) limk→∞ k k 0, comes from the the case of models with dynamic accuracy as in Conn et al. smoothness of the quadratic penalty function. To establish (2000, Section 10.6) can be accommodated in this frame- an upper bound for the Hessian 2-norm for the subprob- work. Approaches such as those in Moré and Wild (2010) lem in (12) we compare the value of the merit function at may be used to characterize the noise in a specific problem. a point (xk +p,σk) with its linearized prediction based on ˜ However, our algorithm applies even in the case of general (xk,σk), (xk + p,σk).Ifκ fg is the Lipschitz constant noise, where no guarantees can be made on the quality of for ∇ fhigh(x), κcm is the maximum Lipschitz constant for the surrogate models. As will be shown in the test problem the constraints, and κcgm is the maximum Lipschitz constant in Section 4, in such a case our approach exhibits robustness for a constraint gradient, we can show that and significant advantages over gradient-based methods that are susceptible to poor gradient estimates. ˜ (xk + p,σk) − (xk + p,σk) Algorithm 1 is more complicated than conventional gradient-based trust-region algorithms using quadratic ≤ κ + σ κ h(x )T g+(x )T penalty functions, such as the algorithm in Conn et al. fg k cm k k (2000, Chapter 14). There are three sources of added com- 2 plexity. First, our algorithm increases the penalty parame- + κcgm A(xk) p+κcmκcgmp ter after each trust-region subproblem as opposed to after a completed minimization of the penalty function. This × 2. aspect can significantly reduce the number of required high- p (23) fidelity evaluations, but adds complexity to the algorithm. ˆ Second, our algorithm switches between two trust region We have a similar result for (x,σk) by replacing κ fg with subproblems to avoid numerical issues associated with the the sum κ fg + κg, using the definition of a fully linear conditioning of the quadratic penalty function Hessian. The model, and by bounding p by k. Therefore, we may third reason for added complexity is that in derivative-free show the error in a linearized prediction of the surrogate optimization the size of the trust-region must decrease to model will go to zero and that Lipschitz-type smoothness {σ 2} zero in order to demonstrate optimality. This aspect serves is ensured provided that the sequence k k converges to to add unwanted coupling between the penalty parameter zero regardless of the constraint violation. ∞ /σ and the size of the trust region, which must be handled The final requirement, that (iii) k=0 1 k is finite, appropriately. We now discuss how these two aspects of comes from the need for the size of the trust region to go to the algorithm place important constraints on the penalty zero in gradient-free optimization. The sufficient decrease con- ˆ ˆ parameter σk. dition, that ρk =0 unless (xk,σk)−(xk + sk,σk) ≥ ak The requirements for the penalty parameter, σk,arethat ensures that the trust region size decreases unless the change σ  =∞ σ 2 = ( ,σ ) − ( + ,σ ) ≥ η  (i) limk→∞ k k , (ii) limk→∞ k k 0, and (iii) in the merit function xk k xk sk k 0a k. ∞ /σ σ k=0 1 k is finite. The lower bound for the growth of k, This provides an upper bound on the total number of times that (i) limk→∞ σkk =∞, comes from the properties of in which the size of the trust region is kept constant or the minima of quadratic penalty functions presented in Conn increased. We have assumed that the merit function is et al. (2000, Chapter 14), properties of a fully linear model, bounded from below and also that the trust-region iter- and (13). If xk is at an approximate minimizer of the sur- ates remain within a level-set L(x0,σ0) as defined by (4). rogate quadratic penalty function, (8), then a bound on the We now consider the merit function written in an alternate constraint violation is form, T + T σ 2  h(xk) g (xk)  k T + T (xk,σk) = f (xk) + h(xk) , g (xk) . (24) high 2 κ ( {α , a}+κ  )+λ(x∗)+κ x − x∗ ≤ 1 max k g k 2 k , σ k From (22), if σk is large enough such that the bound on the (22) constraint violation is less than unity, we may use the bound Constrained multifidelity optimization using model calibration 101 on the constraint violation in (22) to show that an upper multiple high-fidelity constraints can be used if an initial bound on the total remaining change in the merit function is point x0 is given that is feasible with respect to all con- straints; however, due to the effort required to construct fhigh(xk) − min fhigh(x) approximations of the multiple high-fidelity constraints, it x∈L(x0,σ0) is recommended that all of the high-fidelity constraints κ (max{α , a}+κ  )+λ(x∗)+κ x − x∗ 2 + 1 k g k 2 k . be combined into a single high-fidelity constraint through, σk as an example, a discriminant function (Rvachev 1963; (25) Papalambros and Wilde 2000). The optimization problem in (26) is solved in two phases. Each term in the numerator is bounded from above because First, the multifidelity optimization method presented in ∗ k is always bounded from above by max, λ(x ) is Section 2 is used to find a feasible point, and then an interior bounded from above because x∗ is a regular point, and point formulation is used to find a minimum of the opti- ∗ xk − x  is bounded from above because L(x0,σ0) is a mization problem in (26). The interior point formulation is compact set. Therefore, if the series {1/σk} has a finite sum, presented in Section 3.2 and the numerical implementation then the total remaining improvement in the merit function is presented in Section 3.3. is finite. Accordingly, the sum of the series {k} must be  → →∞ finite, and k 0ask . The prescribed sequence 3.1 Finding a feasible point for {σk} in our algorithm satisfies these requirements for a broad range of problems. This algorithm begins by finding a point that is feasible with respect to all of the constraints by applying Algorithm 1 to the optimization problem 3 Multifidelity objective and constraint optimization

min chigh(x) This section considers a more general constrained optimiza- x∈Rn tion problem with a computationally expensive objective s.t. h(x) = 0 function and computationally expensive constraints. The specific problem considered is where the gradients for both g(x) ≤ 0, (27) the expensive objective and expensive constraints are either unavailable, unreliable or expensive to estimate. Accord- until a point that is feasible with respect to the constraints ingly, the multifidelity optimization problem in (1) is aug- in (26) is found. If this optimization problem is uncon- mented with the high-fidelity constraint, c (x) ≤ 0. high strained (i.e., there are no constraints h(x) and g(x))then In addition, we have a low-fidelity estimate of this con- the trust-region algorithm of Conn et al. (2009a) is used with straint function, c (x), which estimates the same metric low the multifidelity calibration method of March and Willcox as c (x), but with unknown error. Therefore, our goal is high (2010). The optimization problem in (27) may violate one to find the vector x ∈ Rn of n design variables that solves of the assumptions for Algorithm 1 in that c (x) may not the nonlinear constrained optimization problem, high be bounded from below. This issue will be addressed in the numerical implementation of the method in Section 3.3. min fhigh(x) x∈Rn s.t. h(x) = 0 3.2 Interior point trust-region method ( ) ≤ g x 0 Once a feasible point is found we minimize the high-fidelity objective function ensuring that we never again violate the chigh(x) ≤ 0, (26) constraints, that is we solve (26). This is accomplished in where h(x) and g(x) represent vectors of inexpensive equal- two steps, first by solving trust region subproblems that ity and inequality constraints with derivatives that are either use fully linear surrogate models for both the high-fidelity known or may be estimated cheaply. The same assump- objective function and the high-fidelity constraint. Second, tions for the expensive objective function formulation given the trust region step is evaluated for feasibility and any in Section 2.1 are made for the functions presented in this infeasible step is rejected. The surrogate model for the formulation. It is also necessary to make an assumption sim- objective function is mk(x) as defined in (2). For the con- ilar to that in Section 2.2, that a quadratic penalty function straint, the surrogate model, m¯ k(x), is defined as with the new high-fidelity constraint is bounded from below within an initial expanded level-set. A point of note is that m¯ k(x) = clow(x) +¯ek(x). (28) 102 A. March, K. Willcox

From the definition of a fully linear model, (9)and(10), in the objective function is at least a constant, a, multiply- m¯ k(x) satisfies ing k, ⎧ ∇ ( ) −∇¯ ( ) ≤ κ  ∀ ∈ B , chigh x mk x cg k x k (29) ⎪min{γ  , } if f (x + s ) ⎪ 1 k max high k k ⎨⎪ ( ) −¯ ( ) ≤ κ 2 ∀ ∈ B . − fhigh(xk) ≥ ak chigh x mk x c k x k (30) k+1 = ⎪ ( + ) ≤ ⎪ and chigh xk sk 0 In addition, we require that our procedure to construct fully ⎩⎪ γ  . linear models ensures that at the current design iterate, the 0 k otherwise fully linear models exactly interpolate the function they are (35) modeling, New surrogate models, mk+1(x) and m¯ k+1(x), are then built B mk(xk) = fhigh(xk), (31) such that they are fully linear on a region k+1 having cen- ter xk+1 and size k+1. The new fully linear models are m¯ k(xk) = chigh(xk). (32) constructed using the procedure of Wild et al. (2008) with the calibration technique of March and Willcox (2010). This is required so that every trust-region subproblem is feasible at its initial point xk. 3.3 Multifidelity objective and constraint implementation The trust-region subproblem is The numerical implementation of this multifidelity opti- min mk(xk + sk) n mization algorithm is presented as Algorithm 2. A set of sk ∈R possible parameters that may be used in this algorithm is s.t. h(xk + sk) = 0 listed in Table 2 in Section 4. An important implementation issue with this algorithm is finding the initial feasible point. ( + ) ≤ g xk sk 0 Algorithm 1 is used to minimize the high-fidelity constraint value subject to the constraints with available derivatives in m¯ (x + s ) ≤ max{c (x ), −w } k k k high k k order to find a point that is feasible. However, Algorithm 1 uses a quadratic penalty function to handle the constraints sk≤k. (33) with available derivatives if the constraints are violated. The The surrogate model constraint does not have zero as a right convergence of a penalty function requires that the objective hand side to account for the fact the algorithm is looking for function is bounded from below, therefore a more general interior points. The right hand side, max{chigh(xk), −wk}, problem than (27) to find an initial feasible point is to use, ensures that the constraint is initially feasible and that 2 protection of constraint violation decreases to zero as the min max{chigh(x) + d, 0} x∈Rn number of iterations increase to infinity. The constant w α must be greater than , which is defined as part of the termi- s.t. h(x) = 0 nation tolerance τk in (13). The trust-region subproblem is solved to the same termination tolerance as the multifidelity g(x) ≤ 0. (36) objective function formulation, ∇x Lk≤τk,whereLk is the Lagrangian of (33). The maximization in the objective prevents the need for the The center of the trust region is updated if a decrease in high-fidelity constraint to be bounded from below, and the the objective function is found at a feasible point, constraint violation is squared to ensure the gradient of the ⎧ objective is continuous. The constant d is used to account ⎪ + ( )> ( + ) ⎨xk sk if fhigh xk fhigh xk sk for the fact that the surrogate model will have some error xk+1 = and chigh(xk + sk) ≤ 0 (34) in its prediction of c (x), so looking for a slightly nega- ⎩⎪ high xk otherwise, tive value of the constraint may save iterations as compared to seeking a value that is exactly zero. For example, if ( + ) = ( + ) ≤ ≥ κ 2 ¯ ( ) + = with h xk sk 0andg xk sk 0 already satisfied d c k and mk x d 0then(30) guarantees that in (33). The trust-region size update must ensure that the chigh(x) ≤ 0. predictions of the surrogate models are accurate and that There is a similar consideration in the solution of (33), the size of the trust region goes to zero in the limit as the where a slightly negative value of the surrogate constraint is number of iterations goes to infinity. Therefore, we again desired. If this subproblem is solved with an interior point impose a sufficient decrease condition, that the change algorithm this should be satisfied automatically; however, if Constrained multifidelity optimization using model calibration 103

a sequential quadratic programming method is used the con- to shrink the trust region at a slower rate, increasing γ0.This straint violation will have a numerical tolerance that is either will help to ensure that the trust-region does not decrease to slightly negative or slightly positive. It may be necessary to zero at a suboptimal point. bias the subproblem to look for a value of the constraint that is more negative than the optimizer constraint violation tol- erance to ensure the solution is an interior point. This feature avoids difficulties with the convergence of the trust-region 4 Supersonic airfoil design test problem iterates. A final implementation note is that if a high-fidelity con- This section presents results of the two multifidelity opti- straint has numerical noise or steep gradients it may be wise mization algorithms on a supersonic airfoil design problem. 104 A. March, K. Willcox

(a) Panel method. (b) Shock-expansion method. (c) Cart3D.

Fig. 1 Supersonic airfoil estimated pressure distribution comparisons for a 5% thick biconvex airfoil at Mach 1.5 and 2◦ angle of attack

4.1 Problem setup We first present single-fidelity results using state-of-the- art derivative-free methods. Then the following sections The supersonic airfoil design problem has 11 parameters: present results for three optimization examples each using the angle of attack, 5 spline points on the upper surface this airfoil problem to demonstrate the capabilities of the and 5 spline points on the lower surface. However, there optimization algorithms presented. In the first example, is an actual total of 7 spline points on both the upper and Section 4.3, the airfoil drag is minimized using the con- lower surfaces because the leading and trailing edges must strained multifidelity objective function formulation pre- be sharp for the low-fidelity methods used. The airfoils are sented in Section 2 with only the simple geometric con- constrained such that the minimum thickness-to-chord ratio straints. In the second example, Section 4.4, the airfoil is 0.05 and that the thickness everywhere on the airfoil must lift-to-drag ratio is maximized subject to a constraint that be positive. In addition, there are lower and upper bounds the drag coefficient is less than 0.01, where the constraint for all spline points. is handled with the multifidelity framework presented in Three supersonic airfoil analysis models are available: a Section 3. In the final example, Section 4.5, the airfoil linearized panel method, a nonlinear shock-expansion the- lift-to-drag ratio is maximized subject to the constrained ory method, and Cart3D, an Euler CFD solver (Aftosmis drag coefficient and both the objective function and the 1997). Note, that Cart3D has a finite convergence tolerance constraints are handled with the multifidelity framework so there is some numerical noise in the lift and drag predic- presented in Section 3. The initial airfoils for all prob- tions. In addition, because random airfoils are used as initial lems are randomly generated and likely will not satisfy conditions, Cart3D may fail to converge, in which case the the constraints. results of the panel method are used. Figure 1 shows com- The three multifidelity airfoil problems are solved puted pressure distributions for each of the models for a 5% with four alternative optimization algorithms: Sequential thick biconvex airfoil at Mach 1.5 and 2◦ angle of attack. Quadratic Programming (SQP) (MathWorks, Inc. 2010); Table 1 provides the estimated lift and drag coefficients a first-order consistent multifidelity trust-region algorithm for the same airfoil and indicates the approximate level of that uses a SQP formulation and an additive correction accuracy of the codes with respect to each other. (Alexandrov et al. 2001); the high-fidelity-gradient-free approach presented in this paper using a Gaussian radial basis function and a fixed spatial correlation parameter, Table 1 5% thick biconvex airfoil results comparison at Mach 1.5 and ξ = ◦ 2; and the approach presented in this paper using a 2 angle of attack maximum likelihood estimate to find an improved corre- ∗ Panel Shock-expansion Cart3D lation length, ξ = ξ . The Gaussian correlation functions used in these results are all isotropic. An anisotropic corre- CL 0.1244 0.1278 0.1250 lation function (i.e., choosing a correlation length for each % Diff 0.46% 2.26% 0.00% direction in the design space) may speed convergence of this algorithm and reduce sensitivity to Hessian condition- CD 0.0164 0.0167 0.01666 % Diff 1.56% 0.24% 0.00% ing. The parameters used for the optimization algorithm are presented in Table 2. The fully linear models are constructed Percent difference is taken with respect to the Cart3D results using the procedure of Wild et al. (2008) with the calibration Constrained multifidelity optimization using model calibration 105

Table 2 List of constants used in the algorithm function known to perform well on this problem (March Constant Description Value and Willcox 2010), while COBYLA handles the constraints explicitly. On this problem, starting from ten random ini- −4 a Sufficient decrease constant 1 × 10 tial airfoils the best observed results were 5,170 function −2 α Convergence tolerance multiplier 1 × 10 evaluations for the Nelder–Mead , over − β Convergence tolerance multiplier 1 × 10 2 11,000 evaluations for DIRECT, and 6,284 evaluations for d Artificial lower bound for 1 COBYLA. Such high numbers of function evaluations mean constraint value that these single-fidelity gradient-free algorithms are too w Constraint violation conservatism 0.1 expensive for use with an expensive forward solver, such factor as Cart3D. −4 ε, ε2 Termination tolerance 5 × 10 γ0 Trust region contraction ratio 0.5 γ1 Trust region expansion ratio 2 4.3 Multifidelity objective function results η0 Trust region contraction criterion 0.25 η1,η2 Trust region expansion criterion 0.75, 2.0 This section presents optimization results in terms of the 0 Initial trust region radius 1 number of high-fidelity function evaluations required to find  the minimum drag for a supersonic airfoil at Mach 1.5 with max Maximum trust region size 20 σ k/10, /1.1 only geometric constraints on the design. Two cases are k Penalty parameter max e 1 k −5 tested: the first uses the shock-expansion method as the δx Finite difference step 1 × 10 high-fidelity function and the panel method as the low- All parameters used in constructing the radial basis function error fidelity function; the second uses Cart3D as the high-fidelity model are the same as in March and Willcox (2010). These param- function and the panel method as the low-fidelity function. eter values are based on recommendations for unconstrained trust- region algorithms and through numerical testing appear to have good These problems are solved using the multifidelity optimiza- performance for an assortment of problems tion algorithm for a computationally expensive objective function and constraints with available derivatives presented in Section 2. The average numbers of high-fidelity function evalua- technique of March and Willcox (2010) and the parameters tions required to find a locally optimal design starting from stated therein. random initial airfoils are presented in Table 3. The results show that our approach uses approximately 78% fewer high- 4.2 Single-fidelity derivative-free optimization fidelity function evaluations than SQP and approximately 30% fewer function evaluations than the first-order consis- For benchmark purposes, we first solve the airfoil opti- tent trust-region method using finite difference gradient esti- mization problem using three single-fidelity gradient-free mates. In addition, Fig. 2 compares the objective function optimization methods: the Nelder–Mead simplex algorithm and constraint violation histories verse the number of high- (Nelder and Mead 1965), the global optimization method fidelity evaluations for these methods as well as DIRECT DIRECT (Jones et al. 1993), and the constrained gradient- and Nelder-Mead simplex from a representative random ini- free optimizer, COBYLA (Powell 1994, 1998). The test tial airfoil for the case with the shock-expansion method case is to minimize the drag of supersonic airfoil estimated as the high-fidelity function. For the single-fidelity opti- with a panel method, subject to the airfoil having positive mization using Cart3D, the convergence results were highly thickness everywhere and at least 5% thickness to chord sensitive to the finite difference step length. The step size ratio. This test case is a lower-fidelity version of the exam- required tuning, and the step with the highest success ple in Section 4.3. Nelder–Mead simplex and DIRECT rate was used. A reason for this is that the initial airfoils are unconstrained optimizers that use a quadratic penalty were randomly generated, and the convergence tolerance

Table 3 The average number of high-fidelity function evaluations to minimize the drag of a supersonic airfoil with only geometric constraints

High-fidelity Low-fidelity SQP First-order TR RBF, ξ = 2 RBF, ξ = ξ*

Shock-expansion Panel method 314 (−) 110 (−65%) 73 (−77%) 68 (−78%) Cart3D Panel method 359* (−) 109 (−70%) 80 (−78%) 79 (−78%)

The asterisk for the Cart3D results means a significant fraction of the optimizations failed and the average is taken over fewer samples. The numbers in parentheses indicate the percentage reduction in high-fidelity function evaluations relative to SQP 106 A. March, K. Willcox

Fig. 2 Convergence history for minimizing the drag of an airfoil using value is shown. COBYLA and BOBYQA were attempted, but failed to the shock-expansion theory method as the high-fidelity function sub- find the known solution to this problem. On the constraint violation ject to only geometric constraints. The methods presented are our plot, missing points denote a feasible iterate, and the sudden decrease calibration approach, a first-order consistent multifidelity trust-region in the constraint violation for the RBF calibration approach at 37 high- 5 algorithm, sequential quadratic programming, DIRECT, and Nelder- fidelity evaluations (13th iteration, σk = 9.13 × 10 ) is when the Mead simplex. Both DIRECT and Nelder-Mead simplex use a fixed algorithm switches from solving (12) to solving (11) penalty function to handle the constraints, so only an objective function

Table 4 The average number of high-fidelity constraint evaluations required to maximize the lift-to-drag ratio of a supersonic airfoil estimated with a panel method subject to a multifidelity constraint

High-fidelity Low-fidelity SQP First-order TR RBF, ξ = 2 RBF, ξ = ξ*

Shock-expansion Panel method 827 (−) 104 (−87%) 104 (−87%) 115 (−86%) Cart3D Panel method 909* (−) 100 (−89%) 103 (−89%) 105 (−88%)

The asterisk for the Cart3D results means a significant fraction of the optimizations failed and the average is taken over fewer samples. The numbers in parentheses indicate the percentage reduction in high-fidelity function evaluations relative to SQP

(a) Initial airfoil. (b) Optimal shock-expansion theory airfoil. (c) Optimal Cart3D flow solution.

Fig. 3 Initial airfoil and supersonic airfoil with the maximum lift-to-drag ratio having drag less than 0.01 and 5% thickness at Mach 1.5 Constrained multifidelity optimization using model calibration 107

Table 5 The average number of high-fidelity objective function and high-fidelity constraint evaluations to optimize a supersonic airfoil for a maximum lift-to-drag ratio subject to a maximum drag constraint

High-fidelity Low-fidelity SQP First-order TR RBF, ξ = 2 RBF, ξ = ξ*

Objective: Shock-expansion Panel method 773 (−) 132 (−83%) 93 (−88%) 90 (−88%) Constraint: Shock-expansion Panel method 773 (−) 132 (−83%) 97 (−87%) 96 (−88%)

Objective: Cart3D Panel method 1168* (−)97(−92%) 104 (−91%) 112 (−90%) Constraint: Cart3D Panel method 2335* (−)97(−96%) 115 (−95%) 128 (−94%)

The asterisk for the Cart3D results means a significant fraction of the optimizations failed and the average is taken over fewer samples. The numbers in parentheses indicate the percentage reduction in high-fidelity function evaluations relative to SQP of Cart3D for airfoils with sharp peaks and negative thick- In the first case, the shock-expansion method is the high- ness was large compared with the airfoils near the optimal fidelity analysis used to estimate both metrics of interest design. This sensitivity of gradient-based optimizers to and the panel method is the low-fidelity analysis. In the the finite difference step length highlights the benefit of second case, Cart3D is the high-fidelity analysis and the gradient-free approaches, especially when constraint gradi- panel method is the low-fidelity analysis. The optimal air- ent estimates become poor. foils are shown in Fig. 3.Table5 presents the number of high-fidelity function evaluations required to find the opti- mal design using SQP, a first-order consistent trust-region 4.4 Multifidelity constraint results algorithm and the techniques developed in this paper. Again a significant reduction (about 90%) in the number of high- This section presents optimization results in terms of the fidelity function evaluations, both in terms of the constraint number of function evaluations required to find the maxi- and the objective, are observed compared with SQP, and mum lift-to-drag ratio for a supersonic airfoil at Mach 1.5 a similar number of high-fidelity function evaluations are subject to both geometric constraints and the requirement observed when compared with the first-order consistent that the drag coefficient is less than 0.01. The lift-to-drag trust region approach using finite differences. ratio is computed with the panel method; however, the drag coefficient constraint is handled using the multifidelity technique presented in Section 3. Two cases are exam- ined: in the first the shock-expansion method models the 5Conclusion high-fidelity constraint and the panel method models the low-fidelity constraint; in the second case, Cart3D models This paper has presented two algorithms for multifidelity the high-fidelity constraint and the panel method models the constrained optimization of computationally expensive low-fidelity constraint. Table 4 presents the average num- functions when their derivatives are not available. The ber of high-fidelity constraint evaluations required to find first method minimizes a high-fidelity objective function the optimal design using SQP, a first-order consistent trust- without using its derivative while satisfying constraints region algorithm and the multifidelity techniques developed with available derivatives. The second method minimizes in this paper. A significant decrease (almost 90%) in the a high-fidelity objective without using its derivative while number of high-fidelity function evaluations is observed satisfying both constraints with available derivatives and when compared with SQP. Performance is almost the same an additional high-fidelity constraint without an avail- as the first-order consistent trust-region algorithm. able derivative. Both of these methods support multiple lower-fidelity models through the use of a multifidelity filtering technique without any modifications to the meth- 4.5 Multifidelity objective function and constraint results ods. For the supersonic airfoil design example considered here, the multifidelity methods resulted in approximately This section presents optimization results in terms of the 90% reduction in the number of high-fidelity function number of function evaluations required to find the max- evaluations compared to solution with a single-fidelity imum lift-to-drag ratio for a supersonic airfoil at Mach 1.5 sequential quadratic programming method. In addition, the subject to geometric constraints and the requirement that the multifidelity methods performed similarly to a first-order drag coefficient is less than 0.01. In this case, both the lift- consistent trust-region algorithm with gradients estimated to-drag ratio and the drag coefficient constraint are handled using finite difference approximations. This shows that using the multifidelity technique presented in Section 3. derivative-free multifidelity methods provide significant 108 A. March, K. Willcox opportunity for optimization of computationally expensive the Lipschitz continuity requirements, due to finite conver- functions without available gradients. gence tolerances in determining the CFD solution; however, The behavior of the gradient-free algorithms presented with the aid of the smooth calibrated surrogates combined is slightly atypical of methods. For with the trust-region model management, our multifidelity example, the convergence rate of these gradient-free algo- method is successful in finding locally optimal solutions. rithms is rapid initially and then slows when close to an Another example is a high-fidelity optimal solution for optimal solution. In contrast, convergence for a gradient- which constraint qualification conditions are not satisfied. based method is often initially slow and then accelerates In the algorithms presented, the design vector iterate will when close to an optimal solution (e.g., as approximate approach a local minimum and the sufficient decrease test Hessian information becomes more accurate in a quasi- for the change in the objective function value will fail. This Newton approach). Also, gradient-based optimizers typ- causes the size of the trust region to decay to zero around the ically find the local optimal solution nearest the initial local minimum even though the KKT conditions may not be design. Although by virtue of the physics involved, the satisfied. presented examples have unique optimal solutions, in a gen- In summary, this paper has presented a multifidelity eral problem the local optimum to which these gradient-free optimization algorithm that does not require estimating gra- algorithms converge may not be the one in the immediate dients of high-fidelity functions, enables the use of multiple vicinity of the initial iterate. An example of this behavior low-fidelity models, enables optimization of functions with is the case when the initial iterate is itself a local optimum, hard constraints, and exhibits robustness over a broad class the surrogate model may not capture this fact and the iter- of optimization problems, even when non-smoothness is ate may move to a different point in the design space with a present in the objective function and/or constraints. For lower function value. airfoil design problems, this approach has been shown to In the case of hard constraints, or when the objective perform similarly in terms of the number of function eval- function fails to exist if the constraints are violated, it is uations to finite-difference-based multifidelity optimization still possible to use Algorithm 2. After the initial feasible methods. This suggests that the multifidelity derivative-free point is found, no design iterate will be accepted if the high- approach is a promising alternative for the wide range of fidelity constraint is violated. Therefore the overall flow of problems where finite-difference gradient approximations the algorithm is unchanged. What must be changed is the are unreliable. technique to build fully linear models. In order to build a fully linear model, the objective function must be evaluated Acknowledgments The authors gratefully acknowledge support at a set of n + 1 points that span Rn. When the objective from NASA Langley Research Center contract NNL07AA33C, tech- function can be evaluated outside the feasible region, the nical monitor Natalia Alexandrov, and a National Science Foundation graduate research fellowship. In addition, we wish to thank Michael constraints do not influence the construction of the surro- Aftosmis and Marian Nemec for support with Cart3D. gate model. However, when the objective function does not exist where the constraints are violated, then the points used to construct the surrogate model must all be feasible and this restricts the shape of the feasible region. Specifically, References this requirement prohibits equality constraints and means that strict linear independent constraint qualification must Aftosmis MJ (1997) Solution adaptive cartesian grid methods for aero- be satisfied everywhere in the design space (preventing dynamic flows with complex geometries. In: 28th computational fluid dynamics lecture series, von Karman Institute for Fluid two inequality constraints from mimicking an equality con- Dynamics, Rhode-Saint-Genése, Belgium, lecture series 1997-02 straint). If these two additional conditions hold, then it will Alexandrov N, Lewis R, Gumbert C, Green L, Newman P (1999) be possible to construct fully linear models everywhere in Optimization with variable-fidelity models applied to wing the feasible design space and use this algorithm to optimize design. Tech. Rep. CR-209826, NASA Alexandrov N, Lewis R, Gumbert C, Green L, Newman P (2001) computationally expensive functions with hard constraints. Approximation and model management in aerodynamic optimiza- Lastly, we comment on the applicability of the proposed tion with variable-fidelity models. AIAA J 38(6):1093–1101 multifidelity approach. Though this paper presents no for- Audet C, Dennis Jr JE (2004) A pattern search filter method for nonlin- mal convergence theory, and at best that theory will only ear programming without derivatives. SIAM J Optim 14(4):980– 1010 apply if many restrictive assumptions hold (for example, Bertsekas DP (1999) Nonlinear programming, 2nd edn. Athena assumptions on smoothness, constraint qualification, and Scientific always using a fully linear surrogate) our numerical experi- Boggs PT, Dennis Jr JE (1976) A stability analysis for perturbed ments indicate the robustness of the approach in application nonlinear iterative methods. Math Comput 30(134):199–215 Booker AJ, Dennis Jr JE, Frank PD, Serafini DB, Torczon V, Trosset to a broader class of problems. For example, the Cart3D MW (1999) A rigorous framework for optimization of expensive CFD model employed in our case studies does not satisfy functions by surrogates. Struct Optim 17(1):1–13 Constrained multifidelity optimization using model calibration 109

Castro J, Gray G, Giunta A, Hough P (2005) Developing a computa- March A, Willcox K (2010) A provably convergent multifidelity opti- tionally efficient dynamic multilevel hybrid optimization scheme mization algorithm not requiring high-fidelity gradients. In: 6th using multifidelity model interactions. Tech. Rep. SAND2005- AIAA multidisciplinary design optimization specialist confer- 7498, Sandia ence, Orlando, FL, AIAA 2010-2912 Conn A, Gould N, Toint P (2000) Trust-region methods. MPS/SIAM MathWorks, Inc. (2010) Constrained nonlinear optimization. series on optimization. Society for Industrial and Applied Optimization toolbox user’s guide, v. 5 Mathematics, Philadelphia Moré JJ, Wild SM (2010) Estimating derivatives of noisy simulations. Conn A, Scheinberg K, Vicente L (2008) Geometry of interpolation Tech. Rep. Preprint ANL/MCS-P1785-0810, Mathematics and sets in derivative free optimization. Math Program 111(1–2):141– Computer Science Division 172 Nelder JA, Mead RA (1965) A simplex method for function minimiza- Conn A, Scheinberg K, Vicente L (2009a) Global convergence of gen- tion. Comput J 7:308–313 eral derivative-free trust-region algorithms to first- and second- Nocedal J, Wright S (2006) Numerical optimization, 2nd edn. order critical points. SIAM J Optim 20(1):387–415 Springer, New York Conn AR, Scheinberg K, Vicente LN (2009b) Introduction to Papalambros PY, Wilde DJ (2000) Principles of optimal design, 2nd derivative-free optimization. MPS/SIAM series on optimization. edn. Cambridge University Press, Cambridge Society for Industrial and Applied Mathematics, Philadelphia Powell MJD (1994) A direct search optimization method that mod- Jones D (2001) A taxonomy of global optimization methods based on els the objective and constraint functions by linear interpola- response surfaces. J Glob Optim 21:345–383 tion. In: Gomez S, Hennart J-P (eds) Advances in optimization Jones D, Perttunen CD, Stuckmann BE (1993) Lipschitzian opti- and numerical analysis, vol 7. Kluwer Academic, Dordrecht, mization without the lipschitz constant. J Optim Theory Appl pp 51–67 79(1):157–181 Powell MJD (1998) Direct search algorithms for optimization calcula- Jones D, Schonlau M, Welch W (1998) Efficient global optimization tions. Acta Numer 7:287–336 of expensive black-box functions. J Glob Optim 13:455–492 Rajnarayan D, Haas A, Kroo I (2008) A multifidelity gradient-free Kennedy M, O’Hagan A (2000) Predicting the output from a com- optimization method and application to aerodynamic design. In: plex computer code when fast approximations are available. 12th AIAA/ISSMO multidisciplinary analysis and optimization Biometrika 87(1):1–13 conference, Victoria, British Columbia, AIAA 2008-6020 Kolda TG, Lewis RM, Torczon V (2003) Optimization by direct Rvachev VL (1963) On the analytical description of some geometric search: new perspectives on classical and modern methods. SIAM objects. Tech. Rep. 4, Reports of Ukrainian Academy of Sciences Rev 45(3):385–482 (in Russian) Kolda TG, Lewis RM, Torczon V (2006) A generating set direct Sasena MJ, Papalambros P, Goovaerts P (2002) Exploration of meta- search augmented lagrangian algorithm for optimization with modeling sampling criteria for constrained global optimization. a combination of general and linear constraints. Tech. Rep. Eng Optim 34(3):263–278 SAND2006-5315, Sandia Wild S (2009) Derivative-free optimization algorithms for computa- Lewis RM, Torczon V (2010) A direct search approach to nonlinear tionally expensive functions. PhD thesis, Cornell University programming problems using an augmented lagrangian method Wild S, Shoemaker CA (2009) Global convergence of radial basis with explicit treatment of linear constraints. Tech. Rep. WM-CS- function trust-region algorithms. Tech. Rep. Preprint ANL/ 2010-01, College of William and Mary Department of Computer MCS-P1580-0209, Mathematics and Computer Science Science Division Liuzzi G, Lucidi S (2009) A derivative-free algorithm for inequal- Wild S, Regis R, Shoemaker C (2008) ORBIT: optimization by radial ity constrained nonlinear programming via smoothing of an l∞ basis function interpolation in trust-regions. SIAM J Sci Comput penalty function. SIAM J Optim 20(1):1–29 30(6):3197–3219