Abstract

GABLONSKY, JORG¨ MAXIMILIAN XAVER. Modifications of the DIRECT Algo- rithm. (Under the direction of Carl Timothy Kelley.) This work presents improvements of a global optimization method for bound con- straint problems along with theoretical results. These improvements are strongly biased towards local search. The global optimization method known as DIRECT was modified specifically for small-dimensional problems with few global minima. The motivation for our work comes from our theoretical results regarding the behavior of DIRECT. Specifically, we explain how DIRECT clusters its search near a global minimizer. An additional influence is our explanation of DIRECT’s behavior for both constant and linear functions. We further improved the effectiveness of both DIRECT, and our modification, by combining them with another global optimization method known as Implicit Filtering. In addition to these improvements the methods were also extended to handle problems where the objective function is defined solely on an unknown subset of the bounding box. We demonstrate the increased effectiveness and robustness of our modifications using optimization problems from the natural gas transmission industry, as well as commonly used test problems from the literature. MODIFICATIONS OF THE DIRECT ALGORITHM

by Jorg¨ M. Gablonsky

a thesis submitted to the graduate faculty of north carolina state university in partial fulfillment of the requirements for the degree of doctor of philosophy

department of mathematics

raleigh, north carolina 2001

approved by:

C. T. Kelley J. C. Dunn chair of advisory committee

P. A. Gremaud M. Shearer Dedication

I want to dedicate this work to my parents Helga and Hans-Georg Gablonsky, my uncle Paul Gablonsky, my sisters Sigrid M¨uller and Doris Gablonsky, and my brother Peter Gablonsky.

ii Biography

J¨org Gablonsky was born in Duisburg, Germany. He received his “Diplom” in “Wirtschaftsmathematik” (Master of Science in Mathematics and Business) in 1999 while already attending a PhD Program in Applied Mathematics at North Carolina State University.

iii Acknowledgments

I would like to express my appreciation for the support and guidance I received from my advisor Dr. C.T. Kelley. Furthermore I want to thank the members of my committee, Dr. J.C. Dunn, Dr. P.A. Gremaud, and Dr. M. Shearer for their support and advice. This work was supported by National Science Foundation grants #DMS-9700569, #DMS-9714811, and #DMS-0070641, and an allocation from the North Carolina Supercomputing Center. I would also like to thank Annie Winfield for her help with my writing and Dr. Susan Katz for her support. Furthermore I want to thank Dr. R.G. Carter for providing us with the problems from the gas pipeline industry, for answering all our questions in that context, and for pointing out ways to improve DIRECT. Finally, I want to thank the other students who helped me, either in the completion of this thesis, or by keeping me focused: B. Adams, Dr. A. Batterman, D.M. Bortz, I. Bredehoeft, C. Camalier, M. Campolattaro, Dr. T.D. Choi, K.A. and T. Coffey, Dr. R. DelRosario, J. Duran, O.J. Esslinger, D. Finkel, A. Gruber, K. Hans, J.C. Haws, S. Jansen, Dr. E.W. Jenkins, K.R. Kavanagh, J. Lefeaux, B.M. Lewis, J.E. Massad, Dr. J.V. Matthews, M. Minimair, A. Patrick, S. and C. Quincke, D. Schurich, S. Shonek, K.A. Yokley, and M. Zager.

iv Table of Contents

List of Tables viii

List of Figures xi

1 Introduction 1 1.1 Overview ...... 6 1.2 Optimization of natural gas transmission networks ...... 7

2 The DIRECT Algorithm 11 2.1 Lipschitz Optimization Problem ...... 11 2.2 Classical one-dimensional Lipschitz Optimization ...... 17 2.3 Classical N-dimensional Lipschitz Optimization ...... 22 2.4 The one-dimensional DIRECT Algorithm ...... 22 2.4.1 Area-Dividing Strategies ...... 23 2.4.2 Potentially optimal intervals ...... 26 2.4.3 The one-dimensional DIRECT algorithm ...... 31 2.5 The general/multidimensional DIRECT algorithm ...... 33 2.5.1 Dividing in higher ...... 34 2.5.2 Potentially optimal hyperrectangles ...... 37

3 Extensions to the DIRECT algorithm 39 3.1 Changes to the original DIRECT algorithm ...... 39 3.1.1 Changes to DIRECT by other authors ...... 45 3.2 Stopping criteria ...... 46 3.3 Choice of ² ...... 48 3.4 Extensions to problems with hidden constraints ...... 49 3.4.1 Replace the function value at an infeasible by a high value 49 3.4.2 Create an extra list of infeasible points ...... 50 3.4.3 Replace the function value at an infeasible point by the value of a nearby feasible point ...... 51 3.5 Use of DIRECT as a starting point generator ...... 54

v 4 Analysis of DIRECT 57 4.1 Convergence Analysis ...... 57 4.2 Local clustering ...... 59 4.3 Iteration estimates ...... 61 4.3.1 Worst case analysis ...... 62 4.3.2 Constant functions ...... 70 4.3.3 Linear functions ...... 73 4.3.4 Hidden constraints ...... 81

5 Numerical Results 83 5.1 Numerical Results in the Optimization of Gas Pipeline Networks . . . 84 5.1.1 The optimization Infrastructure ...... 84 5.1.2 Example problems ...... 85 5.1.3 Accuracy of the model ...... 90 5.1.4 Setup of the numerical Experiments ...... 90 5.1.5 Results of the optimization ...... 91 5.2 Numerical Results on Test Problems used by Jones et al...... 99 5.2.1 Description of the Test Problems ...... 100 5.2.2 Numerical results ...... 105

6 Conclusions 110

List of References 112

A Implementation of the DIRECT and DIRECT-l algorithm 122 A.1 The general algorithm ...... 122 A.1.1 The data structure ...... 122 A.1.2 Details ...... 125 A.2 Parallel Implementation ...... 127 A.3 Extensions to problems with hidden constraints ...... 129

B Implicit Filtering 131

C DIRECT Version 2.0 User Guide 135 C.1 Introduction to DIRECT ...... 136 C.1.1 Problem description ...... 136 C.2 Using DIRECT ...... 137 C.2.1 What is included in the package ...... 137 C.2.2 Calling DIRECT ...... 139 C.2.3 Sample main program ...... 144 C.3 A short overview of the DIRECT algorithm and our modifications . . . 149 C.3.1 Dividing the domain ...... 150 C.3.2 Potentially optimal hyperrectangles ...... 151

vi C.3.3 The DIRECT algorithm ...... 153 C.3.4 Our modification to the DIRECT algorithm ...... 154 C.3.5 Extensions to the DIRECT algorithm ...... 155 C.4 The Test Problems ...... 158 C.4.1 Elementary functions ...... 159 C.4.2 Example for hidden constraints ...... 159 C.4.3 Test functions described in Jones et.al. [48] ...... 160 C.4.4 Main features of the Test Functions ...... 165 C.4.5 Numerical results ...... 166

vii List of Tables

3.1 Summary of stopping criteria ...... 48

4.1 The resulting hyperrectangles after the division of an N-dimensional with side length 1/3k...... 63 4.2 Number of (stage 0) of side length 1/3k needed to cover the n-dimensional hypercube of length 1...... 64 4.3 Maximum number of iterations needed to branch all hypercubes of level k in N...... 69

5.1 Results for Problem A. The actual minimum value is 3233...... 92 5.2 Results for Problem B. The actual minimum value is 3204...... 93 5.3 Results for Problem C. The actual minimum value is 4041...... 95 5.4 Success rates for IFFCO with 500 randomly selected starting points. . 99 5.5 Parameters for the Shekel’s family of functions ...... 100 5.6 Parameters for the Hartman’s family of functions ...... 101 5.7 Summary of the important features of the test functions...... 105 5.8 Numerical results with percentage termination criteria...... 106 5.9 Numerical results with a budget of 60N function evaluations ..... 107 5.10 Numerical results for DIRECT-IFFCO with a combined budget of 60N function evaluations. The budget is divided into 10N function evalua- tions for DIRECT and 50N function evaluations for IFFCO...... 108 5.11 Numerical results for DIRECT-l-IFFCO with a combined budget of 60N function evaluations. The budget is divided into 10N function evalua- tions for DIRECT-l and 50N function evaluations for IFFCO...... 108

A.1 Routines for the parallel implementation of DIRECT...... 128

C.1 Parameters for the Shekel’s family of functions ...... 161 C.2 Parameters for the Hartman’s family of functions ...... 162 C.3 Summary of the important features of the test functions...... 166 C.4 Results for the test functions...... 167

viii List of Figures

1.1 View of a large compressor station...... 7 1.2 A schematic diagram of a simplified gas transmission network. .... 9

2.1 Example of a Lipschitz continuous function f and the lower bounding function fˆ...... 19 2.2 Example of the iteration of the Piyavskii algorithm...... 21 2.3 Example of a Lipschitz continuous function f and the area in which it must be contained...... 24 2.4 Dividing strategies...... 25 2.5 The graph of f, its lower bounds, and the related graph after the first division...... 26 2.6 The graph of f, its lower bounds, and the related graph after the second division...... 27 2.7 The graph of f, its lower bounds, and the related graph after the third division...... 28 2.8 The graph of f, the points at which the function was evaluated, and the related graph...... 29 2.9 The graph of f, the points at which the function was evaluated, and the related graph...... 29 2.10 Dividing of a hypercube...... 36

3.1 Example of a with level l = 1 and stage p =1...... 40 3.2 Example of a three-dimensional rectangle with level l = 2 and stage p =2...... 40 3.3 Example of an infeasible point and how its surrogate value is calculated. 53 3.4 Example of an infeasible point, whose value was replaced by the value of a feasible point nearby, becoming completely infeasible...... 54

4.1 Points at which the function is evaluated when DIRECT divides a three- dimensional cube...... 62 4.2 Hyperrectangles created when DIRECT divides a three-dimensional cube. 62 4.3 Examples of squares of level 1 and 2 needed to cover the unit square. 64 4.4 First four divisions of the [0, 1]...... 65

ix 4.5 First three divisions of the square [0, 1]2...... 66 4.6 The points at which DIRECT samples the function when dividing a square...... 74 4.7 The division DIRECT creates when 0 a2 >0...... 75 4.11 The first three iterations of DIRECT for a1

5.1 The feasible area of Problem A...... 86 5.2 The optimization landscape of Problem A...... 87 5.3 The feasible area of Problem B...... 87 5.4 The optimization landscape of Problem B...... 88 5.5 The feasible area of Problem C...... 89 5.6 The optimization landscape of Problem C...... 89 5.7 500 randomly selected starting points for IFFCO used in Problem A. . 92 5.8 500 randomly selected starting points for IFFCO used in Problem B. . 94 5.9 The division DIRECT creates for Problem C...... 96 5.10 The division DIRECT-l creates for Problem C...... 97 5.11 500 randomly selected starting points for IFFCO used in Problem C. . 98 5.12 Plot of the Branin function...... 102 5.13 Plot of the Goldstein and Price function...... 103 5.14 Plot of the Goldstein and Price function around the global minimum. 103 5.15 Plot of the Six-hump camelback function...... 104 5.16 Plot of the Six-hump camelback function around the global minima. . 104 5.17 Plot of the two-dimensional Shubert function...... 104 5.18 Plot of the two-dimensional Shubert function around a global minimum.104

A.1 Example for the structure to store the lists (Part a)...... 124 A.2 Example for the structure to store the lists (Part b)...... 124 A.3 Flowchart of the algorithm ...... 126 A.4 Diagram of the Master-Slave paradigm ...... 127

B.1 Example of a function that is composed of a simple function and noise. 132

C.1 Dividing of a hypercube...... 152 C.2 Example of an infeasible point and how its surrogate value is calculated.157 C.3 Example of an infeasible point, whose value was replaced by the value of a feasible point nearby, becoming completely infeasible...... 158 C.4 Contour plot of the Gomez # 3 function...... 160 C.5 Plot of the Gomez # 3 function...... 160 C.6 Plot of the Branin function...... 163 C.7 Plot of the Goldstein and Price function...... 164

x C.8 Plot of the Goldstein and Price function around the global minimum. 164 C.9 Plot of the Six-hump camelback function...... 165 C.10 Plot of the Six-hump camelback function around the global minima. . 165 C.11 Plot of the two-dimensional Shubert function...... 165 C.12 Plot of the two-dimensional Shubert function around a global minimum.165

xi List of Algorithms

2.1 Piyavskii(a, b, γ, ξ) ...... 20 2.2 DIRECT-1D(a, b, f, ², numit, numfunc)...... 31 2.3 Divide ...... 35 2.4 DIRECT(a, b, f, ², numit, numfunc)...... 38 3.1 DIRECT(a, b, f, ², numit, numfunc)...... 51

3.2 ReplaceInf({ci}, {li}, {fi}) ...... 52 B.1 BriefIFFCO ...... 132 C.1 DIRECT(a, b, f, ², numit, numfunc)...... 153 C.2 DIRECT(a, b, f, ², numit, numfunc)...... 155

C.3 ReplaceInf({ci}, {li}, {fi}) ...... 156

xii Chapter 1

Introduction

This work is concerned with two improvements of the global optimization method known as DIRECT. In addition we seek to provide a better understanding of the be- havior of this method. DIRECT was developed to solve difficult global optimization problems with bound constraints. This method is able to identify the area near a global minimum with few function evaluations. However, DIRECT often needs many more function evaluations to find a good approximation of a global minimum. Im- provement of DIRECT was achieved in two ways: first, by modifying the original method and second, by combining it with another method known as Implicit Filtering. In order to evaluate the effectiveness and robustness of various global optimization methods, we utilized optimization problems found in industrial applications. The ob- jective function of this kind of industrial optimization problem is given by a computer program which is often referred to as a “black-box”. For the purpose of this discus- sion, we define the robustness of a method as the degree by which the performance of the method depends on its starting point. We measure the effectiveness of various global optimization methods by comparing the number of function evaluations that each method requires to find a solution. Our focus on this measure is due to the considerable amount of time function evaluations may take. In addition to the substantial time expense, there are a number of characteristic properties associated with this type of problem. Some characteristics are shared by all

1 Chapter 1. Introduction 2

problems, while others occur only in specific cases. For instance, all of these problems have box constraints. Also, for all problems, derivative information is not available and therefore only function values can be used. For most problems the objective function has local as well as global minima. Often the objective function contains noise, and may even be discontinuous. Problems may be further complicated by the existence of hidden constraints. Hidden constraints exist both when the function is not defined everywhere within the given bounding box and the feasible area is unknown. To determine if the function is defined at a certain point, an optimization method must evaluate the function at that specific point. Now that we have given a brief overview of the type of problem used, we will introduce the methods we utilized in this work. These methods, described in detail in Chapters 2 and 3, are known as sampling methods or derivative-free methods in that they only use function values and no derivative information. In other words, they only “sample” the function. These methods were designed for exactly the type of problem described earlier since they use all the available information and do not require more information from the objective function. More conventional methods are unable to solve these problems because they require additional information such as first derivative information. This document is especially concerned with one such method, known as DIRECT, which was developed by Jones, Perttunen and Stuckmann [48]. We improved the performance of DIRECT for small dimensional problems that have few global minima by introducing a modified version, which we call DIRECT-l. We further improved the performance of DIRECT and DIRECT-l by combining them with a second optimization method known as Implicit Filtering. Implicit Filtering is also a sampling method and its implementation, IFFCO, has been shown to be very efficient in many applications [7, 16, 23, 36]. As is the case for DIRECT and DIRECT-l, these combined methods are designed to solve difficult, small dimensional optimization problems. The original work of Jones et al. [48] provides a short convergence proof for DIRECT. Chapter 1. Introduction 3

We further this work by explaining observations made about the behavior of DIRECT and DIRECT-l. The primary observation we explain is that DIRECT creates clusters around local minima, and eventually around all global minima. This means that DIRECT will sample the function more in the area around local minima than in the rest of the search space. Furthermore, we look at the behavior of both DIRECT and DIRECT-l for constant and linear functions. These theoretical observations are the first to be published apart from the theoretical observations done by Jones et al. in their original description of DIRECT [48]. Finally, in addition to the above, we implemented DIRECT and DIRECT-l for both single- and multiple-processor computers. Our implementation of DIRECT was one of the first, and is still one of only a few, freely available implementations [28, 29].

Classes of Optimization Problems

We used this implementation on a concentrated set of examples which represent cer- tain classes of problems. The first set of examples comes from a class of optimization problems found in the gas transmission pipeline industry. These problems were pro- vided to us by R. G. Carter of Stoner and Associates, Inc. Stoner provides software solutions for the simulation and optimization of energy (including gas) and water delivery networks [11]. The second set of examples consists of a suite of commonly used test problems pulled directly from the literature.

Classes of Optimization Methods

One class of methods used to solve these problems are the so called stochastic methods, see for example [9, 42, 53, 56, 66, 69, 71, 76]. A common characteristic of all stochastic methods is that they iteratively create randomly distributed points. They differ in the stochastic processes they use to distribute the points and in what they do with these randomly created points. The two stochastic methods proposed by Tikhomirov [69] Chapter 1. Introduction 4 and Hart [42] are similar to our combined methods in that they also use a combination, however, while they switch back and forth between two methods, we only switch once. A second class of methods is known as Lipschitz optimization methods. These methods, including DIRECT [48], construct lower bounding functions for the objective function by using the Lipschitz constant or an estimation of it. Additional examples of Lipschitz optimization methods are the Piyavskii/Shubert algorithm [61, 65], the Cubic [32] and Beta algorithms [33], multilevel coordinate search [45], and the two- level method of Hansen et al. [41]. For an in-depth overview of many of these methods see [39, 43, 60]. A third class of methods includes Nelder-Mead [57], multidirectional search or pattern search methods [1, 2, 25, 24, 26, 49, 70] and Implicit Filtering [36]. The primary characteristic of these methods is that they examine a of points and then change the simplex according to the function value at these points [10].

Specific Optimization Methods

DIRECT, developed by Jones et al. [48], is a Lipschitz optimization method which is based on classical methods for Lipschitz optimization. Lipschitz optimization meth- ods usually require few function evaluations to find the area near an optimal point. However, a big disadvantage of Lipschitz optimization methods is that they often need many more function evaluations to actually find a good approximation of that optimal point [39, 40]. In contrast to DIRECT, Implicit Filtering [36] uses difference gradients, where the difference increments are reduced as the optimization progresses. These difference gradients are then used in projected Quasi-Newton iterations. This method was designed for objective functions that are small perturbations of smooth functions, see [17] and [35]. However, like most sampling methods, this method is also used on problems where the objective function has discontinuities. In order to perform well, Implicit Filtering needs a good starting point without which it can terminate in a Chapter 1. Introduction 5

local minima far away from the solution. In order to overcome the problems of slow convergence for DIRECT/DIRECT-l, and the strong dependence of Implicit Filtering on its starting point, we propose to combine these two methods. DIRECT, and especially DIRECT-l, are able to create good starting points with few function evaluations. By this we create a method that retains DIRECT’s independence on a starting point and its fast identification of the area near an optimal point. We then overcome DIRECT’s problem of finding a good approximation to the solution with few function evaluations by using Implicit Filtering. Cramer [22] describes another way to facilitate DIRECT’s fast identification of the area near the global minima. They used DIRECT to build a model of the objective function using the points selected by DIRECT. This model was then used to look at main effects and interactions between the variables. In one of their problems, they determined that one variable was not important for the optimization and could be eliminated, therefore reducing the dimension of the problem. The actual optimization was then done using multi-objective optimization and reduced models.

Hidden Constraints

For problems with hidden constraints, the combination of DIRECT and DIRECT-l is much more robust than Implicit Filtering alone, which may not be able to find a feasible point, much less a solution for many starting points. Using our theoretical observations we show that if the feasible area is small, most of the time DIRECT-l requires significantly fewer function evaluations than DIRECT to find a feasible point. In the worst case, DIRECT and DIRECT-l need the same number of function evalua- tions. One of the examples from the gas pipeline industry has such a small feasible area that it allows us to clearly observe these differences in the number of function evaluations needed. Chapter 1. Introduction 6

1.1 Overview

In the rest of this chapter we give preliminary background information on the gas pipeline industry and the associated optimization problems in Section 1.2. Chapter 2 describes general Lipschitz optimization methods, in particular the DIRECT method. We especially provide the motivation behind the main ideas of DIRECT. Chapter 3 describes our changes to the original DIRECT method which led to the development of DIRECT-l. In addition, we describe changes to DIRECT done by other authors, including Jones. We end this chapter by explaining how we extended DIRECT and DIRECT-l to handle hidden constraints, and how to use the two methods as a starting point generator for Implicit Filtering. Chapter 4 contains our theoretical observations for DIRECT and DIRECT-l.In this chapter we first show that DIRECT and DIRECT-l converge and then explain the clustering created by these methods around local and ultimately global minima. We conclude this chapter by giving estimates for the number of iterations DIRECT and DIRECT-l need in extreme cases, where some of these estimates are sharp. Chapter 5 contains our numerical results. We start by looking at problems from the gas transmission pipeline industry application. First we describe in more detail where these problems come from. We compare performance of the algorithms on three sets of real data, which test the algorithms in different situations. We finish the chapter with numerical results for a group of test problems frequently used to test global optimization methods. Chapter 1. Introduction 7

Figure 1.1: View of a large compressor station.

1.2 Optimization of natural gas transmission net- works

This section describes both the natural gas transmission industry in general as well as the optimization problems that arise therein. These descriptions are based primarily on technical reports provided by Stoner Associates, Inc. [12, 13, 14, 51, 52, 62]. Most of the natural gas consumed by industrial and private customers is delivered by means of sophisticated systems of pipelines [51]. These networks often cover hundreds, or even thousands, of miles and may include hundreds of supply and/or delivery points. As of 1995, over four hundred independent companies collectively operating over 380 thousand miles of pipeline [62]. Chapter 1. Introduction 8

Some of the most important components of gas transmission networks are com- pressor stations which are used to keep the gas flowing. The number of compressor stations in a given network can vary significantly, depending on the size of the net- work. In addition, compressor stations can vary dramatically in size. Figure 1.1 shows an aerial photograph of a medium-large compressor station, courtesy of R. G. Carter of Stoner Associates [11]. In this photograph, the compressors and driving engines are inside the long building on the left. Behind the long building are gas coolers. To the right of the long building are devices which extract water and impurities from the gas. However, not all compressor stations are of this size. Carter points out that there are smaller compressor stations that consist of just one compressor in a small building with no human attendants. Compressor stations comprise the largest expense of running a natural gas trans- mission network. In fact, [51] points out that the cost of running the compressors may represent between 25% and 50% of the company’s total operating budget. While a few turbines use alternate power sources such as electricity, most are gas powered [12]. According to [52], gas turbines within compressor stations use between 3% and 5% of the transported gas as fuel. This amount of gas translates into millions of dollars per year creating a strong pressure in the industry to lower operational costs. This pres- sure was further increased by deregulation of the natural gas pipeline industry in the United States. Due to these internal and external pressures, it has become necessary to run networks at optimal configuration with respect to cost, while remaining within the limits given by physical, legal and business considerations. [68] describes that in recent years while the demand for natural gas has soared, supply has declined. Figure 1.2 [52] shows the schematic of a small pipeline network. The arrows rep- resent the flow of gas, the circles stand for supply or delivery points, and the squares represent the regulators. Finally, the triangles show the location of the compressors, which, as mentioned, are the main cost generators. The compressor station shown in Figure 1.1 would be represented by one of these triangles. Chapter 1. Introduction 9

Supply or Delivery Point

Compressor

Regulator 11 4

10 3

159 12 131415165

17

782

Figure 1.2: A schematic diagram of a simplified gas transmission network.

The natural gas industry is primarily driven by customer demand [62]. This demand puts certain requirements on both the delivery system and the production system. According to [62], one of the defining factors of the industry, from an eco- nomic standpoint, is that natural gas is used mainly as a heating fuel. This means that demand is nearly inversely proportional to ambient air temperature, which re- sults in wide fluctuations in demand over the course of a year. The cost of operating a network is mostly governed by peak delivery amounts, but there are significant variable costs such as the cost of running the compressors. Finally, reliability is of critical importance. Many old appliances are run by pilot flames that burn constantly and therefore any interruption in service would be dangerous to the customers. If the appliances are not shut down before the service is restored, serious accidents could occur. Keeping these industry requirements in mind, [52, 51] describe the following four general optimization problems in the gas transmission industry:

1. Fuel Minimization: Given the flows in the network, determine the settings for compressors and regulators that minimize the amount of fuel used by compres- sors.

2. Cost of Fuel Minimization: Given the flows in the network, determine the settings for compressors and regulators that minimize the cost of fuel used by compressors. Chapter 1. Introduction 10

3. Throughput Maximization: Given a set of supply/delivery/transport contracts, determine the maximum amount of gas that can be transported by the network while satisfying contractual obligations and physical constraints.

4. Profit Maximization: Given a set of supply/delivery/transport contracts, deter- mine the optimum allocation among these contracts in order to maximize profit while satisfying physical constraints.

It is important to note that in order to solve the second problem, solutions for the first problem are required. The second problem is more complex than the first one because the cost of fuel usage may not be linearly related to the amount of fuel. For example, it may be the case that an increase in fuel usage may result in a reduction in unit cost. Furthermore, there may be start-up costs which have to be taken into account in the second problem. Finding solutions for these problems require different techniques which range from standard methods for linear programming (LP methods) to nonlinear optimization. [52] points out that the above problems were historically solved by financial de- partments. They used crude models of the networks within LP methods to optimize the long-term operation of gas networks. In these crude models the flow of gas is described in very simplified ways. In contrast, engineering departments developed fairly accurate models for gas-flow in networks using nonlinear models. [52] describes a first attempt to use these more accurate models to solve the economic problems. Since these models are nonlinear, standard LP methods cannot be used. Further- more engineering department models were not designed with optimization in mind. This means that an optimization method can only use function values during an op- timization. This describes the setting of the “black box” optimization problems we concentrate on in this document. In Section 5.1 we describe in more detail one such problem, and report our numerical results for three examples arising in this context. Chapter 2

The DIRECT Algorithm

In this chapter we describe the DIRECT algorithm, which was developed by D. R. Jones, C. D. Perttunen and B. E. Stuckman [48] in 1993. DIRECT derived its name from dividing rect angles, one of its main features. Before describing the DIRECT algorithm, we define the abstract problems we try to solve. Furthermore we will describe some properties of Lipschitz continuous functions, which were used in the development of some classical methods for Lipschitz optimization problems. We then point out the weaknesses of these methods. The chapter ends with the introduction of the one-dimensional DIRECT algorithm, and an explanation of how to extend it to several dimensions.

2.1 Lipschitz Optimization Problem

All of the following observations are based on the definition of Lipschitz continuous functions. We denote by N ∈ N the dimension of the problem. Definition 2.1. Let M ⊂ RN and f : M → R. The function f is called Lipschitz continuous on M with Lipschitz constant γ if

|f(x) − f(x0)|≤γ||x − x0|| ∀x, x0 ∈ M. (2.1)

11 Chapter 2. The DIRECT Algorithm 12

Here, k·kis any norm on RN . The definition is independent of the used norm as shown in the following lemma. This lemma follows directly from the norm equivalence on RN . N N Lemma 2.1. Let k·ka and k·kb be two norms on R .LetM⊂R and

f : M → R. Then f Lipschitz continuous with respect to k·ka implies f is Lipschitz

continuous with respect to k·kb. Proof. Since on RN all norms are equivalent (see Kreyszig [50]), there exist q, Q > 0 such that N qkxka ≤kxkb ≤Qkxka ∀x∈R . (2.2)

Let f be Lipschitz continuous with respect to k·ka, that is

|f(x) − f(y)|≤γkx−yka, ∀x, y ∈ M. (2.3)

Let x, y ∈ M. Then

(2.3) |f(x) − f(y)| ≤ γkx − yka (2.4) (2.2) ≤ k − k |{z}qγ x y b, (2.5) =γ0

which concludes the proof. The following theorem shows that the class of Lipschitz continuous functions is very general. Theorem 2.1. Let M ⊂ RN be a bounded, closed set, and let f be a continuously

differentiable function on an open convex set M0 ⊇ M. Then f is Lipschitz continuous on M. Proof. Let x0 ∈ M be an arbitrary point. Then the first order Taylor-expansion Chapter 2. The DIRECT Algorithm 13 of f around x0 is given by

f(x)=f(x0)+g(xr)T(x−x0),x∈M, (2.6)

where g denotes the gradient of f, and xr is a suitably chosen point of the interval [x0,x]. Now, applying the Cauchy-inequality to equation 2.6, we obtain

|f(x) − f(x0)| = |g(xr)T (x − x0)|≤kg(xr)kkx − x0k. (2.7)

Denote by co(M) the closure of the convex hull of M. Then co(M)=co(M) ⊆ M0, see [59], and therefore co(M) is compact and the norm of the gradient g is bounded k k on co(M). Then maxx∈co(M) g(x) is an appropriate Lipschitz constant, and f is Lipschitz continuous. See Pint´er [60] for a similar assertion. We now define the three different problem classes we will look at in the rest of this document. The first two problem definitions were obtained from Hansen, et al. [39]. N Problem 1 (P). Let −∞

  • f(x∗)=f∗ = min f(x). x∈Ω

    The question is whether or not there exists an algorithm that can solve this prob- lem for all Lipschitz continuous functions with a finite number of function evaluations. Each method to solve problems P can only use the function values and the Lipschitz constant. Other information, like first or second derivatives, may not exist. For ex- ample, the absolute value function over a finite interval [a, b],a < 0 1 steps. Let xi,i=1,...,k be the set of points where the function was evaluated. Since x∗ = 1 is the upper bound of the domain of f, we know that all points xi =6 x∗ are to the left of x∗.

    1 k Denote Xk =(x,...,x ) and let f(Xk) be the set of corresponding function values. Let xj ∈ Xk \{x∗}be the evaluation point closest to x∗ to the left. We now define a new Lipschitz continuous function f 1 as   ∈j 1 f(x),x[0,x ], f (x)= max{f(xj) − γ(x − xj),f(x∗)−γ(x∗ −x)},x∈[xj,x∗].

    Clearly f 1 is Lipschitz continuous with Lipschitz constant γ and the global minimum of f 1 is attained at xj + x∗ f(xj) − f(x∗) xˆ = + 2 2γ Chapter 2. The DIRECT Algorithm 15 with

    1 f 1(ˆx)= (f(xj)+f(x∗)−γ(x∗ − xj)) 2 |{z } µ =0 ¶ 1 1 = (xj − 1)2 − (1 − xj) 2 µ2 ¶ 1 1 1 = (xj)2 − xj + − 1+xj 2 µ2 ¶2 1 1¡ ¢ = (xj)2 −1 2µ2 ¶ 1 1 = (xj −1)(xj +1) 2 2 xj<1 < 0=f(x∗).

    The strategy for algorithm A only depends on the Lipschitz constant γ, the points 1 Xk and the function values f(Xk). These are the same for f and f , therefore we get

    f(x∗) = min f(xi) = min f 1(xi). i=1,...,k i=1,...,k

    That means that algorithm A concludes that x∗ is a global minimizer of f 1, which is clearly not the case. The construction of f 1 will come up again in the next section, where we use it to construct an algorithm to solve problems of the kind described next. Because of the above observation, we will consider problems of the following form. These problems can be solved with finitely many function evaluations as we show after the definition. Problem 2 (P0). Let f :Ω→Rbe Lipschitz continuous with constant γ. Find xopt ∈ Ω such that ∗ fopt = f(xopt) ≤ f + ξ, (2.9)

    where ξ is a given small positive constant. Hansen, et al. [39] call an algorithm ²-convergent if it solves Problem P0 in a finite Chapter 2. The DIRECT Algorithm 16

    number of iteration. That means that it can find a point xopt with finitely many function evaluations. The so-called passive algorithm is an ²-convergent algorithm 0 that always solves Problem P . In the one-dimensionall casem the passive algorithm ξ 3ξ 5ξ γ(u1−l1) evaluates f at l1 + γ ,l1+ γ ,l1+ γ ,.... After at most 2ξ function evaluations, a point satisfying (2.9) is found for any Lipschitz continuous function f. This means a solution for problem P0 can always be found with finitely many function evaluations. In higher dimensions the passive algorithm evaluates the function on a regular or- √ thogonal grid of points, which are 2ξ/γ N apart in each direction, see Meewella and Mayne [54]. The disadvantage of the passive algorithm is the high number of function evaluations that are needed to find a solution. This algorithm does not adapt to the function, the points at which the function is evaluated are the same for any function. We also look at a third kind of problem. As for the other two problem classes, Ω is a given hyperrectangle. The difference to the first two problems is that the function f is only defined over a subset B ⊂ Ω, but is undefined over RN \ B. Problem 3 (P00). Let B ⊂ Ω and f : B → R be Lipschitz continuous with constant γ.Letf∗ be f ∗ = min f(x). x∈B

    Find xopt ∈ B such that ∗ fopt = f(xopt) ≤ f + ξ, (2.10)

    where ξ is a given small positive constant. If B is not given analytically, we say that the problem has hidden constraints. Problems with hidden constraints often occur in black box optimization, see Section 5.1. Note if Ω = B, problems of kind P00 are also of kind P0. Chapter 2. The DIRECT Algorithm 17

    2.2 Classical one-dimensional Lipschitz Optimiza- tion

    We start our description of classical Lipschitz optimization methods with the one- dimensional, or univariate, case. All these classical methods require knowledge of the Lipschitz constant. This is a serious disadvantage of these methods, since the problem of finding the Lipschitz constant is as hard as solving Problem P0, see [67, 74].In the following let f :[a, b] → R be a Lipschitz continuous function with Lipschitz constant γ. The first algorithm suggested for solving univariate Lipschitz optimization prob- lems (that is Problem P0 with N = 1) was independently given by Piyavskii [61] and Shubert [65]. In the following we will refer to it as the Piyavskii algorithm. The idea is to set x0 = a and x0 = b in inequality (2.1) to get the following inequalities :

    f(x) ≥ f(a) − γ(x − a), ∀x ∈ [a, b],

    f(x) ≥ f(b)+γ(x−b), ∀x∈[a, b].

    Using these two inequalities, we can construct a piecewise linear function fˆ such that fˆ(x) ≤ f(x), ∀x ∈ [a, b].

    fˆ is given by   f(a)−γ(x−a),x∈[a, x(a, b)], ˆ f(x)= f(b)+γ(x−b),x∈[x(a, b),b],

    where x(a, b)=[f(a)−f(b)]/(2γ)+[a+b]/2, (2.11) Chapter 2. The DIRECT Algorithm 18 and the minimum value of fˆ is given by

    B(a, b)=[f(a)+f(b)]/2 − γ(b − a)/2.

    Note that this is the same construction as was used above in the example. The following lemma shows that the function fˆ is always defined. To ensure this, we only need to make sure that x(a, b) ∈ [a, b]. Lemma 2.2. Let f be Lipschitz continuous on [a, b] with constant γ and x(a, b) be given by Equation (2.11). Then x(a, b) ∈ [a, b]. Proof. We will prove that a ≤ x(a, b).

    f(a)−f(b)a+b x(a, b)= + 2γ 2 f(b)−f(a)γ(b−a) =− + +a 2γ 2γ γ(b−a)−(f(b)−f(a)) = + a 2γ 1 = (γ(b − a) − (f(b) − f(a))) +a 2γ | {z } ≥0 f Lipschitz ≥ a.

    Similarly we can show that x(a, b) ≤ b, which concludes the proof. Figure 2.1 shows an example of a function f and its lower bounding function fˆ, where f is given by

    µµ ¶ ¶ 1 f(x) = sin x − 4π +6x2 +2, ∀x∈[0, 1]. (2.12) 2

    Then µµ ¶ ¶ 1 f 0(x)=4πcos x − 4π +12x, ∀x ∈ [0, 1]. 2 Chapter 2. The DIRECT Algorithm 19

    f(b)

    f(a)

    B(a,b)

    a x(a,b) b

    Figure 2.1: Example of a Lipschitz continuous function f and the lower bounding function fˆ.

    Using the proof of Theorem 2.1, we get a Lipschitz constant γ of

    γ = sup |f 0(x)| =4π+12. x∈[0,1]

    Figure 2.1 also shows x(a, b) and B(a, b). The idea of the Piyavskii algorithm is to divide the original interval into two

    intervals I1 =[a, x(a, b)] and I2 =[x(a, b),b]. We then evaluate f at x(a, b) and

    calculate new values x1 = x(a, x(a, b)),x2 =x(x(a, b),b) and B1 = B(a, x(a, b)),B2 =

    B(x(a, b),b) for each of these two intervals. Using the lower bounds B1 and B2 for f in these two intervals, we then divide the interval with smallest B in the next step. Algorithm 2.1 shows Piyavskii’s algorithm in detail. The inputs to this algorithm are the lower and upper bounds of the interval [a, b], the Lipschitz constant γ, and 0 ξ>0. The algorithm returns xopt and f(xopt), which are solutions to Problem P ; Chapter 2. The DIRECT Algorithm 20

    Algorithm 2.1 Piyavskii(a, b, γ, ξ)

    1: n =1, sample =1,lsample = a, usample = b 2: Calculate B1 = B(a, b),x1 =x(a, b) 3: Let fopt = min{f(a),f(b)},xopt = arg min{f(a),f(b)},Bopt = B1. 4: while fopt − Bopt >ξdo 5: Choose new interval to sample, this interval has index sample. 6: ln+1 = xn,un+1 = usample,usample = xn 7: fn = f(xn) 8: Calculate Bn+1 = B(ln+1,un+1),xn+1 = x(ln+1,un+1), Bsample = B(lsample,usmaple),xsample = x(lsample,usample) 9: Update fopt = mini=1,...,n+1{f(a),f(ui)},xopt = arg mini=1,...,n+1{f(a),f(ui)} and Bopt = mini=1,...,n+1 Bi 10: n=n+1 11: end while that is ∗ f(xopt) ≤ f + ξ,

    ∗ where f = minx∈[a,b]f(x). The first three steps initialize the algorithm. We use n as a counter, sample is an index to the interval which is divided in this iteration and lsample and usample are the lower and upper bounds of this interval. We store the coordinates and the function value of the best point found so far in xopt and fopt.InBopt we store the lowest bound found so far. The main loop is shown in steps 4 through 11. In step 5 we choose which interval to divide. The division is done by storing the lower and upper coordinates of one of the two new intervals in ln+1,un+1. The second interval is created by shrinking the original interval. We do this by replacing the upper bound of the original interval with xn. After evaluating f at xn we calculate B and x for these two new intervals. We repeat this until the difference between the best function value found so far and the value of the smallest lower bound is smaller than ξ, that is, until

    f(xopt) − min Bi ≤ ξ. (2.13) i=1,...,n+1 Chapter 2. The DIRECT Algorithm 21

    f(b) f(b)

    f(a) f(a)

    B(a,b)

    a x(a,b) b a b f(b) f(b)

    f(a) f(a)

    a b a b

    Figure 2.2: Example of the iteration of the Piyavskii algorithm.

    This stopping criteria ensures that the algorithm has found a solution to Problem P0, since it follows directly from inequality (2.13) that

    ∗ f(xopt) ≤ min Bi + ξ ≤ min f(x)+ξ=f +ξ. i=1,...,n+1 x∈[a,b]

    Figure 2.2 shows an example of the first three iteration of the Piyavskii algorithm. The last plot shows the result of 15 iteration of it. The piecewise linear function fˆ constructed by this algorithm becomes a better approximation of the real function in every iteration. Because of its look, fˆ is sometimes called the saw-tooth cover [43]. Nevertheless, the Piyavskii algorithm has two main problems.

    • In higher dimensions this algorithm needs to store 2N corners for each Chapter 2. The DIRECT Algorithm 22

    hyperrectangle, where N is the dimension of the problem. Moreover, it has to evaluate the function at all these points. • In applications the Lipschitz constant is normally unknown if the function itself is a complicated problem or a simulation.

    The first disadvantage is not so much a problem of storage, but of the number of function evaluations needed. This is especially the case if function evaluations are expensive. The second disadvantage is very serious, since the problem of finding the Lipschitz constant is as hard as solving Problem P0, see [67, 74].

    2.3 Classical N-dimensional Lipschitz Optimization

    Many attempts have been made to extend the Piyavskii algorithm to higher dimen- sions, see [54, 55, 56, 61]. All these algorithms have in common that they need about 2N function evaluations in each iteration and knowledge of the Lipschitz constant. Wood and Zhang [74] describe a statistical method to estimate the Lipschitz con- stant for the univariate case. This estimation is done independently of the optimiza- tion and requires substantial work.

    2.4 The one-dimensional DIRECT Algorithm

    The problems of the Piyavskii algorithm are addressed by the DIRECT algorithm, where DIRECT stands for dividing rect angles. This algorithm was developed by D. R. Jones, C. D. Perttunen and B. E. Stuckman [48]. To explain this algorithm, we first assume that we know a Lipschitz constant, and drop this requirement later. We again start with inequality (2.1) in its one- dimensional formulation:

    |f(x) − f(x0)|≤γ|x−x0|∀x, x0 ∈ [a, b]. Chapter 2. The DIRECT Algorithm 23

    Let c =(a+b)/2 and set x0 = c in (2.1). Then ∀x ∈ [a, b]

    x ∈ [a, c]:f(c)+γ(x−c)≤f(x)≤f(c)−γ(x−c)

    x∈[c, b]:f(c)−γ(x−c)≤f(x)≤f(c)+γ(x−c).

    These two inequalities define a region in which the graph of the function is con- tained. Furthermore we get a lower bound D(a, b) for f in [a, b] by letting x = a or x = b in these inequalities. This means we evaluate the bounding function at the two endpoints of the interval.

    D(a, b)=f(c)−γ(b−a)/2.

    Figure 2.3 shows this area (shaded) for the same function as before, that is

    µµ ¶ ¶ 1 f(x) = sin x − 4π +6x2 +2, ∀x∈[0, 1]. 2 The bounding area is large, and therefore needs to be refined. To do this, we need to sample the function at more points and divide the search area. The next section explains how DIRECT does this.

    2.4.1 Area-Dividing Strategies

    The DIRECT algorithm divides the original interval into three intervals of equal length and evaluates the function at the midpoints of each new interval. Since the midpoint of one of the new intervals is the same as the midpoint of the original interval, we only need to evaluate the function at two new points. Therefore one of the main differences between the DIRECT and the Piyavskii algo- rithm is that DIRECT divides the original interval into three intervals of equal length instead of two of non-equal length as in the Piyavskii algorithm. Galperin [33, 34] and Shen, et al.[64] also suggest to use the midpoint of the interval as the point to Chapter 2. The DIRECT Algorithm 24

    f(c) − γ(x − c) f(c) + γ(x − c)

    a b

    Figure 2.3: Example of a Lipschitz continuous function f and the area in which it must be contained. Chapter 2. The DIRECT Algorithm 25

    Piyavskii Algorithm DIRECT Algorithm

    a b a b

    ax(a,b) b a c b

    a1b1 = a2 b2 a c2c1 c3 b

    Figure 2.4: Dividing strategies.

    evaluate the function, but then they divide the interval there. Gourdin, et al. [38] suggest the same strategy as DIRECT for the division. They also give the same exten- sion for the N-dimensional case, which we will discuss in Section 2.5. Their method differs from DIRECT in the way they decide which hyperrectangle to divide next. This means that we give up some information when we divide the interval. We gain that we evaluate the function only in the middle of the interval in the DIRECT algorithm. This has the big advantage that this idea can easily be extended to higher dimensions without increasing the number of function evaluations needed per hyperrectangle. This is due to the fact that hyperrectangles in higher dimensions also have only one center. If we extend the idea of evaluating the function at the endpoints of intervals to higher dimensions, we have to evaluate the function at all corners. This increases the number of function calls exponentially when the dimension of the problem is increased. Furthermore the DIRECT strategy does not depend on the Lipschitz constant, since the points at which the function is evaluated are always midpoints. This will become important in the following. We show the two different dividing strategies of DIRECT and the Piyavskii algo- rithm in Figure 2.4. Here × mark the points where the function is evaluated, and | mark the endpoints of the intervals. Chapter 2. The DIRECT Algorithm 26

    10 10

    8 8

    6 6

    f 4 4

    2 2 function value at midpoint of the intervall

    0 0

    −2 −2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 x (lenght of intervall)/2

    Figure 2.5: The graph of f, its lower bounds, and the related graph after the first division.

    2.4.2 Potentially optimal intervals

    It remains to describe how DIRECT decides which intervals to divide. An important feature of DIRECT is that DIRECT makes this decision without the need to know the Lipschitz constant γ. To explain this method, we first look at an example. Let f be the same as before. We first assume that we know γ, and then show what we do to drop the requirement to know γ. The left of Figure 2.5 shows the graph of f. In it we marked the three midpoints of the intervals where we evaluated f, and show the lower bounding function for these intervals. Note that the union of the three intervals is the original interval. The right of Figure 2.5 shows a related graph. The construction of this related graph is due to Jones et al. [48]. In this related graph the x-axis represents half of the length of each interval, that is, (bi −ai)/2. In the first step all intervals have the same length. The circles represent the three intervals. The x-coordinates of the circles are given by half of the length of the interval and the y-coordinates are given by the function value at the center of the interval. The y-axis represents the lower bounds given in the first graph. The lines with slope γ correspond to each lower bound from the first graph. Note that Chapter 2. The DIRECT Algorithm 27

    10 10

    8 8

    6 6

    f 4 B 4 B

    A 2 2 A function value at midpoint of the intervall

    0 0

    −2 −2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 x (lenght of intervall)/2

    Figure 2.6: The graph of f, its lower bounds, and the related graph after the second division.

    the intersections of these lines with the y-axis are (0,D(ai,bi)), where D(ai,bi)isthe lower bound for the function in each interval.

    We now choose the interval with the lowest bound D(ai,bi) and divide this interval in the next iteration. We do this since the possible improvement in the function value is largest for this interval. Figure 2.6 shows the new graphs after we have divided the first interval. Now there are five intervals, two with a length of 1/3 and three with a length of 1/9. Note that the center of the interval which was divided is now the center of one of the smaller intervals. Therefore there are five points and five corresponding lines in the graph on the right. We again choose the interval with smallest lower bound D(ai,bi) to divide in the next iteration. Note that we did not choose the interval with lowest function value found so far (i.e. the interval marked A), since its lower bound is larger than the smallest lower bound (i.e. the interval marked B). The possible improvement in the function value in the interval marked B is greater than the possible improvement in interval A. Figure 2.7 shows the two graphs after we divide interval B. As remarked earlier, in many applications the Lipschitz constant γ is not known. Chapter 2. The DIRECT Algorithm 28

    10 10

    8 8

    6 6

    f 4 B 4 B

    A 2 2 A function value at midpoint of the intervall

    0 0

    −2 −2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 x (lenght of intervall)/2

    Figure 2.7: The graph of f, its lower bounds, and the related graph after the third division.

    Therefore we try to get an estimate for the Lipschitz constant based on the known data in a particular step. To do this we look at the data points (shown as circles) we created in the right graphs. In particular we look at the points for which there exists ˜ some constant Ki > 0 such that the corresponding interval would be the choice if − ˜ ˜ bi ai γ = Ki. This means we could lay a line with slope Ki through the point ( 2 ,f(ci)) such that all other created data points lie above this line. Suppose several intervals have the same length. Then we only need to consider the points which represent one of these intervals with the lowest function value at their midpoint. This situation corresponds to several circles on a vertical line, as in Figure 2.7. Now all the points

    for which such a K˜i exists would promise the best improvement for the function if

    the Lipschitz constant of the function would be K˜i.

    We call the intervals for which such a K˜i exists potentially optimal. Figure 2.8 shows an example of this. On the left we show the graph of the function and the points at which we have evaluated the function. On the right we show the corresponding data points. Only the three points connected by a line are potentially optimal. Figure 2.9 shows the result after we have divided all potentially optimal intervals from Figure 2.8. Note that we changed the horizontal scale on the right graph since Chapter 2. The DIRECT Algorithm 29

    10 10

    8 8

    6 6

    f 4 4

    2 2 function value at midpoint of the intervall

    0 0

    −2 −2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 x (lenght of intervall)/2

    Figure 2.8: The graph of f, the points at which the function was evaluated, and the related graph.

    10 10

    8 8

    6 6

    4 4

    2 2 function value at midpoint of the intervall

    0 0

    −2 −2 0 0.01 0.02 0.03 0.04 0.05 0.06 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (lenght of intervall)/2

    Figure 2.9: The graph of f, the points at which the function was evaluated, and the related graph. Chapter 2. The DIRECT Algorithm 30 there are no more intervals with a length of 1/3. Only the two points connected by the line are potentially optimal. This time all intervals with shortest length do not satisfy the condition for potentially optimality. We give a formal definition when a interval is potentially optimal in Definition 2.2.

    Definition 2.2. Let ²>0be a positive constant and let fmin be the current best function value. Interval j is said to be potentially optimal if there exists some rate of change constant K>˜ 0such that

    f(cj) − K˜ (bj − aj)/2 ≤ f(ci) − K˜ (bi − ai)/2, ∀i (2.14)

    f(cj) − K˜ (bj − aj)/2 ≤ fmin − ²|fmin|. (2.15)

    In the definition, inequality (2.14) expresses the decision only to choose inter- vals which would promise the best improvement in the function value if f would be

    Lipschitz continuous with Lipschitz constant K˜i. Inequality (2.15) ensures that we have the possibility of a sufficient decrease in the interval. Jones et al. [48] describe that choices of ² between 10−3 and 10−7 provide good results. We will come back to the choice of ² later when we describe modifications to the original DIRECT algorithm in Section 3.3, in our theoretical observations in Chapter 4, and in the numerical results in Chapter 5. Sergeyev [63] describes a similar estimation for the Lipschitz constant, but uses it in a different context. In each iteration Jones et al.[48] divide all potentially optimal intervals. We will come back to this in Chapter 3.1 when we compare the original DIRECT algorithm to our modification. Chapter 2. The DIRECT Algorithm 31

    2.4.3 The one-dimensional DIRECT algorithm

    We take all of the above together to get the one-dimensional DIRECT algorithm 2.2.

    Algorithm 2.2 DIRECT-1D(a, b, f, ², numit, numfunc)

    1: m =1,c1 =(a+b)/2 2: evaluate f(c1),fmin = f(c1),t=0 3: while t < numit and m < numfunc do 4: Identify the set S of potentially optimal intervals 5: while S =6 ∅ do 6: Take j ∈ S 7: Sample new points (cm+1,cm+2), update borders 8: Evaluate f(cm+1),f(cm+2), update fmin 9: Set m = m +2,S =S\{j} 10: end while 11: t = t +1 12: end while

    In the algorithm the first two steps are the initialization. The variable m is a counter for the number of function evaluations the DIRECT algorithm has done, and t is a counter for the number of iteration. Jones et al. [48] stop the algorithm after numit iteration. They also remark that if the Lipschitz constant is known, the termination criteria of the Piyavskii algorithm can be used. Another obvious way to stop the algorithm is to stop once the number of function evaluations reaches a prescribed limit numfunc. Note that this number is only an approximate upper bound, since DIRECT finishes the division of all potentially optimal hyperrectangles before stopping. We will come back to this in Section 3.2. Note that there are two possibilities of parallelism here. These are the inner loop (steps 5 to 10) and the function evaluations inside the inner loop (step 8). We come back to this after we describe the N-dimensional DIRECT algorithm. It is not clear from the definition how to actually identify potentially optimal intervals. Therefore we give the following lemma, which shows an easy way to identify potentially optimal intervals.

    Lemma 2.3. Let ²>0be a positive constant and let fmin be the current best Chapter 2. The DIRECT Algorithm 32

    function value. Let I be the set of all indices of all intervals, di =(bi−ai)/2and j ∈ I be given. Let I1 = {i ∈ I : di dj}and I3 = {i ∈ I : di = dj}. Interval j is potentially optimal if

    f(cj) ≤ f(ci), ∀i ∈ I3, (2.16)

    there exists K>˜ 0such that

    f(c ) − f(c ) f(c ) − f(c ) max j i ≤ K˜ ≤ min i j , (2.17) ∈ ∈ i I1 dj − di i I2 di − dj

    and fmin − f(cj) dj f(ci) − f(cj) ² ≤ + min , if fmin =06 , (2.18) ∈ |fmin| |fmin| i I2 di − dj or f(ci) − f(cj) f(cj) ≤ dj min , if fmin =0. (2.19) ∈ i I2 di − dj

    Before we prove this lemma, we show how to use it. We first identify all intervals satisfying (2.16). In the following calculations we only need to look at these intervals. For each of these intervals we calculate the maximum and minimum as given in

    inequality (2.17). If the maximum over all intervals in I1 is greater than the minimum over all intervals in I2, the current interval cannot be potentially optimal. If the maximum is smaller than the minimum, we use equations (2.18) and (2.19) to finally decide if an interval is potentially optimal. We now prove lemma 2.3.

    Proof. We first look at inequality (2.14). Using the definitions of I1,I2 and I3, inequality (2.14) becomes

    f(cj) − f(ci) K˜ ≥ , ∀i ∈ I1, (2.20) dj − di

    f(ci) − f(cj) K˜ ≤ , ∀i ∈ I2, (2.21) di − dj Chapter 2. The DIRECT Algorithm 33 and

    f(cj) ≤ f(ci), ∀i ∈ I3. (2.22)

    From the last equation it follows directly that interval j can only be potentially

    optimal if f(cj) = mini∈I3 f(ci). If j satisfies this condition, then equations (2.20) and (2.21) give f(c ) − f(c ) f(c ) − f(c ) max j i ≤ K˜ ≤ min i j . (2.23) ∈ ∈ i I1 dj − di i I2 di − dj We now look at inequality (2.15). Interval j can only be potentially optimal if a K>˜ 0 exists that satisfies inequality (2.17). We want to choose K˜ as large as possible. Therefore the interval with index j is potentially optimal if

    fmin − f(cj) dj f(ci) − f(cj) ² ≤ + min ,fmin =06 , (2.24) ∈ |fmin| |fmin| i I2 di − dj

    or f(ci) − f(cj) f(cj) ≤ dj min ,fmin =0. (2.25) ∈ i I2 di − dj This concludes the proof.

    2.5 The general/multidimensional DIRECT algorithm

    In the following we describe how to extend the DIRECT algorithm to several dimen- sions. We first make the assumption that Ω is the N-dimensional unit hypercube. We make this assumption for two reasons. The first is that we want to have a scaled domain. This ensures that all variables are weighted equally and that differences in scales do not affect the optimization. For example, if the problem consists of two variables x1,x2 and the bounds for them are

    −6 10 0 ≤ x1 ≤ 10 and 0 ≤ x2 ≤ 10 ,

    −7 a change of 10 in x1 is a large change, whereas the same change for x2 is not at all Chapter 2. The DIRECT Algorithm 34 significant for most functions. The second reason is that with this assumption the hyperrectangles DIRECT cre- 1 ates only have side lengths of a power of 3 . Furthermore the distance from the center of a hyperrectangle to its corners is given by an easy formula. These values can be calculated a priori. We will give this formula in chapter 3.1, after we have described how the general DIRECT algorithm works. The main difference in higher dimensions compared to the one-dimensional case is the problem of the division of the search space. For a better understanding we start with a hypercube and then describe the more general case of a hyperrectangle.

    2.5.1 Dividing in higher dimensions

    Dividing of a hypercube

    Let c be the center point of a hypercube. Evaluate the function at the points c § δei,

    where δ equals 1/3 of the side length of the cube and ei is the i-th Euclidean base

    vector. Define wi as

    wi = min{f(c + δei),f(c−δei)}.

    Divide the hypercube in the order given by the wi, starting with the lowest wi.

    We divide the hypercube first perpendicular to the direction with the lowest wi. Then we divide the remaining volume perpendicular to the direction of the second

    lowest wi, and so on until the hypercube is divided in all directions. The division we create with this strategy puts c in the center of a hypercube with side length δ. Let b = arg mini=1,...,N {f(c + δei),f(c−δei)}. b will be the center of a hyperrectangle with one side with a length of δ, and N −1 sides with a length of 3δ. This means that of the new hyperrectangles, the one with the smallest function value at its center has the largest volume of all new hyperrectangles. Chapter 2. The DIRECT Algorithm 35

    Figure 2.10a shows an example of the division of a hypercube. Here

    w1 = min{5, 8} =5

    w2 = min{6, 2} =2.

    Therefore we first divide perpendicular to the x2-axis, and then in the second step we divide the remaining rectangle perpendicular to the x1-axis. The rectangle with a function value of 2 at its center is one of the largest created.

    Dividing of a hyperrectangle

    If we need to subdivide one of the hyperrectangles, we only divide them along their longest side. This assures that we get a decrease in the maximal length of the hyper- rectangle. Furthermore we know that the side lengths of the hyperrectangles DIRECT creates have at most two different lengths. We come back to this in section 3.1. Figure 2.10b represents the next step in the algorithm. DIRECT will divide the shaded area. The second box in Figure 2.10b shows where DIRECT samples the func- tion, and the third box shows how the rectangle is only divided once. Figure 2.10c shows the third step in the algorithm for this example. In this step DIRECT will divide the two shaded rectangles. One of them is a square, therefore it is divided twice as described before. The larger area is again a rectangle and gets divided once. Therefore we get algorithm Divide.

    Algorithm 2.3 Divide 1: Identify the set I of dimensions with maximum side length d, set δ = d/3 2: Sample f at c § δei,i∈I 3: Calculate wi and divide according to the values of the wi Chapter 2. The DIRECT Algorithm 36

    66

    a 9959858

    22

    6 66

    b 5 998 5985 8

    2 362362

    6 769769

    c 59 8 598 598

    3 3 362 362 362 51 51 4 4

    Figure 2.10: Dividing of a hypercube. Chapter 2. The DIRECT Algorithm 37

    2.5.2 Potentially optimal hyperrectangles

    The only remaining thing needed for the N-dimensional DIRECT algorithm is the definition of potentially optimal hyperrectangles, which we give in the following. In

    this definition σi is a characterization of the hyperrectangle i.

    Definition 2.3. Let ²>0be a positive constant and let fmin be the current best function value. A hyperrectangle j is said to be potentially optimal if there exists some K>˜ 0such that

    f(cj) − Kσ˜ j ≤ f(ci) − Kσ˜ i, ∀i (2.26)

    f(cj) − Kσ˜ j ≤ fmin − ²|fmin|. (2.27)

    The only difference in this definition compared to definition 2.2 is that instead of the interval length as in the one-dimensional case we need another measure for the size σi of the hyperrectangle.

    Jones et al. [48] choose σi to be the distance from the center of hyperrectangle i to its vertices. In Section 3.1 we talk about the problems with this choice and what we do instead. In this chapter we only describe the DIRECT algorithm as given by Jones et al. [48]. With definition 2.3 we have everything we need for the general DIRECT algorithm. Here again the first two steps are the initialization. We identify the set of poten- tially optimal hyperrectangles using lemma 2.4, which is the N-dimensional version of lemma 2.3. Note that there are again the two stages of possible parallelism, the inner loop and the function evaluations inside the inner loop. In our parallel implementation, which we describe in Appendix A.2, we only par- allelized the function evaluations in the inner loop. Baker et al. [5, 6, 4], Baker [3] and Watson and Baker [73] describe a fully parallelized implementation of DIRECT. They use both stages of parallelism in their implementation. Chapter 2. The DIRECT Algorithm 38

    Algorithm 2.4 DIRECT(a, b, f, ², numit, numfunc)

    1: Normalize the search space to be the unit hypercube with center point c1. 2: Evaluate f(c1),fmin = f(c1),t=0,m=1. 3: while t < numit and m < numfunc do 4: Identify the set S of potentially optimal hyperrectangles 5: while S =6 ∅ do 6: Take j ∈ S . 7: Sample new points, evaluate f at the new points and divide the hyperrect- angle with Divide. 8: Update fmin,m=m+∆m. 9: Set S = S \{j}. 10: end while 11: t = t +1. 12: end while

    Lemma 2.4. Let ²>0be a positive constant and let fmin be the current best function value. Let I be the set of all indices of all intervals. Let I1 = {i ∈ I : σi <

    σj}, I2 = {i ∈ I : σi >σj}and I3 = {i ∈ I : σi = σj}. Interval j ∈ I is potentially optimal if

    f(cj) ≤ f(ci), ∀i ∈ I3, (2.28) there exists K>˜ 0such that

    f(c ) − f(c ) f(c ) − f(c ) max j i ≤ K˜ ≤ min i j , (2.29) ∈ ∈ i I1 σj − σi i I2 σi − σj and fmin − f(cj) σj f(ci) − f(cj) ² ≤ + min ,fmin =06 , (2.30) ∈ |fmin| |fmin| i I2 σi − σj or f(ci) − f(cj) f(cj) ≤ σj min ,fmin =0. (2.31) ∈ i I2 σi − σj

    We omit the proof of this lemma, since it is identical to the proof of lemma 2.3. Chapter 3

    Extensions to the DIRECT algorithm

    This chapter first discusses some of the problems of the original DIRECT algorithm and how we improved the algorithm. We also describe changes suggested by other authors. We then talk about stopping criteria and how to choose ² in the definition of potentially optimal hyperrectangle. Following this is a description of how we ex- tended DIRECT to handle hidden constraints and we end the chapter by describing how DIRECT can be used as a starting point generator for Implicit Filtering [15].

    3.1 Changes to the original DIRECT algorithm

    DIRECT divides the original domain into hyperrectangles, which can only be of certain sizes. All sides of a hyperrectangle DIRECT creates have length 3−k, where k ∈ N∪{0}. If the longest side has a length of 3−l, the smallest side has at least a length of 3−l−1. This follows directly from the way DIRECT divides hyperrectangles and the assumption that the domain is the N-dimensional unit hypercube. Following [31], we use this observation to define the level and the stage of a hyperrectangle. Definition 3.1. Let M be an N -dimensional hyperrectangle created by DIRECT. Assume that M has p sides with a length of 3−l−1 and N − p sides with a length of 3−l. Then we say that M is of level l and stage p, where 0 ≤ p ≤ N − 1.

    39 Chapter 3. Extensions to the DIRECT algorithm 40

    1/27 1/9 1/27 1/3 1/9

    Figure 3.1: Example of a rect- Figure 3.2: Example of a angle with level l = 1 and stage three-dimensional rectangle with p =1. level l = 2 and stage p =2.

    This means that a hyperrectangle of level l and stage p is the result of p subdivi- sions of a hypercube with side length 3−l. To understand this definition a little better, we give some examples of hyperrect- angles and their levels and stages. Example

    • Let M0 be the unit hypercube. Since all sides of M0 have a length of ¡ ¢ 1 0 1= ,M0 has level l = 0. Since M0 is a hypercube, it has stage p =0. 3 ¡ ¢ 1 1 1 • Let M1 be a two-dimensional rectangle with side lengths = and ¡ ¢ 3 3 1 1 2 = , see Figure 3.1. Then M1 has level l = 1 and stage p =1. 9 3 ¡ ¢ 1 1 2 • Let M2 be a three-dimensional rectangle with side lengths = , ¡ ¢ ¡ ¢ 9 3 1 1 3 1 1 3 27 = 3 and 27 = 3 , see Figure 3.2. Then M2 has level l = 2 and stage p =2. This means the longer the longest side of a hyperrectangle the smaller the level. Note also that level and stage of a hyperrectangle are always greater or equal 0. The original DIRECT algorithm as described by Jones et al. [48] characterizes hy- perrectangles by their diameter. The l2 diameter of a hyperrectangle with level l and stage p can easily be calculated using the following lemma, see also Jones [47, 48]. Lemma 3.1. Let M be a hyperrectangle of level l and stage p. Then the l2 diameter d(l, p) of M is given by

    p d(l, p)=3−l N−8p/9. Chapter 3. Extensions to the DIRECT algorithm 41

    Proof. Let a and b be opposite corners of M. Then v u uXN t 2 d(l, p)=kb−ak2 = (bi − ai) . i=1

    Since p sides of M have a length of 3−l−1, there exists I ⊂{1,...,N},|I| = p such

    −l−1 −l that bi − ai =3 ,∀i∈Iand bi − ai =3 ,∀i∈{1,...,N}\I. Therefore q − −1 2 − 2 d(l, p)=kb−ak2 = p(3 l ) +(N−p)(3 l) r −l p − =3 2+(N p) r3 p =3−l N−p+ p 9 =3−l N−8p/9.

    To characterize a hyperrectangle of level l and stage p Jones et al. [48] use

    σ2 = d(l, p)/2,

    which is a characterization based on the l2 diameter. In the one dimensional case this characterization becomes half of the interval length, the same as before. Note that this characterization of hyperrectangles depends on the dimension of the problem. A p cube with sidelength 1 has σ2 = 1/2 in two dimensions, but in three dimensions p σ2 = 3/4. We instead use a characterization based on the infinity norm. We characterize the same hyperrectangle by −l σ∞ =3 /2.

    This measure is based on the l∞ diameter, that is the length of the longest side Chapter 3. Extensions to the DIRECT algorithm 42 of the hyperrectangle. Note that these two characterizations are the same in the one-dimensional case. Our choice does not depend on the dimension of the problem. Furthermore this characterization groups the hyperrectangles into fewer groups, and thereby biases the search more toward local search. With this characterization, only the level of the hyperrectangle is used. We treat hyperrectangles of the same level, but different stages, as having the same size. Another important property of both of these characterizations is that they can be represented by an integer. Clearly our characterization is uniquely represented by the level l. The l2 characterization can be represented uniquely by lN + p. We use this in our implementation, see Appendix A.1.1. By using these integer characterizations of hyperrectangles we can reduce the work to identify potentially optimal hyperrectangles. In lemma 3.2 we show that of all hyperrectangles with the same measure only the hyperrectangles with the lowest function value at their centers can be potentially optimal. This suggests to keep a sorted list of all hyperrectangles of a given measure. We can index this list by using the integer representation given above. By doing this we can significantly reduce the number of comparisons needed in opposition to comparing all hyperrectangles with each other. Jones et al. [48] already point out these savings for the one-dimensional case. Furthermore the integer representations of the measures of the hyperrectangles DIRECT creates when it divides a hyperrectangle can be calculated easily. For the l2 characterization, we only need to add and multiply integers, and for the l∞ measure we need to add integers and take the minimum over N integers. This index then points to the precalculated measures. Therefore we do not need to use any floating point operations to calculate the measures. We use the following notation in the next discussions. Let S denote the set of hyperrectangles DIRECT has created in an iteration. S is the state of DIRECT. For all S ∈Swe denote its center by c(S) and its measure by σ(S). The next lemma gives Chapter 3. Extensions to the DIRECT algorithm 43

    a necessary condition for potentially optimal hyperrectangles. Lemma 3.2. Of all hyperrectangles with the same measure w only one with lowest function value at the center can be potentially optimal. Proof. Assume that the hyperrectangle T ∈Swith σ(T )=wis potentially optimal. Let S ∈S,c=c(S),σ(S)=wbe such that

    f(c) = min f(c(Sˆ)). Sˆ∈S,σ(Sˆ)=w

    Assume that f(T ) >f(S). Since T is potentially optimal, there exists K>˜ 0 such that f(c(T )) − Kσ˜ (T) ≤ f(c(Sˆ)) − Kσ˜ (Sˆ), ∀Sˆ ∈S.

    Letting Sˆ = S, we get, with σ(T )=σ(S)=w, that

    f(c(T )) ≤ f(c),

    a contradiction to f(c(T )) >f(c). Note that lemma 3.2 is valid for both characterization. In our implementation we decided to divide at most one hyperrectangle of every measure. When more than one hyperrectangles with the same measure are potentially optimal, we arbitrarily choose one of them. From the last lemma we know that two hyperrectangles with the same measure can only both be potentially optimal if they have the same function value at their respective centers. The decision only to choose one hyperrectangle of each size influences the iteration estimates we derive in Chapter 4. This strategy is in contrast to Jones et al.[48], who divide all potentially optimal hyperrectangles in each iteration. This can increase the number of function evalu- ations DIRECT does in one iteration. We, on the other hand, increase the number of iterations. The idea is that with our strategy we may improve on the best point Chapter 3. Extensions to the DIRECT algorithm 44

    found so far earlier without doing too many function evaluations. As soon as we find a better point, the focus changes, as we show in lemma 3.3. The same lemma also shows that DIRECT will always choose one of the largest hyperrectangles if there are several hyperrectangles with the minimum value at the center.

    Lemma 3.3. Let S ∈Ssuch that f(c(S)) = fmin and σ(S) is the largest measure

    among all hyperrectangles having the same value fmin at their center, i.e.

    f(c(S)) = min f(c(Sˆ)) and σ(S) = max σ(Sˆ). ˆ ˆ ˆ S∈S S∈S,f(c(S))=fmin

    If fmin =06 , then S is potentially optimal if ² satisfies

    σ f(c(Sˆ)) − fmin ² ≤ min , (3.1) |fmin| Sˆ∈Q σ(Sˆ) − σ

    where Q = {Sˆ ∈S:f(c(Sˆ)) >fmin,σ(Sˆ) >σ}.Iffmin =0, then S is potentially optimal.

    This means that if several hyperrectangles with function value fmin at their center exist, only the largest of them can be potentially optimal. Only these may be divided by DIRECT. We now proof lemma 3.3.

    Proof. Let fmin be assumed at the center of hyperrectangle S with measure σ =

    σ(S) and center c = c(S). Since f(c)=fmin, the second condition for S to be potentially optimal, that is inequality (2.27) f(c) − Kσ˜ ≤ fmin − ²|fmin|, holds if

    Kσ˜ ² ≤ , if |fmin|6=0, (3.2) |fmin|

    and is automatically satisfied if fmin = 0. Let Sˆ ∈S,σˆ=σ(Sˆ) andc ˆ = c(Sˆ). If Chapter 3. Extensions to the DIRECT algorithm 45

    σˆ ≤ σ, then inequality (2.26) holds for all K>˜ 0, since

    f(c) − Kσ˜ ≤ f(ˆc) − K˜ σˆ ≤ − ˜ − 0 |f(ˆc) {z f(c}) +K (|σ {z σˆ}) . ≥0 ≥0

    If f(ˆc)=fmin, inequality (2.26) is clearly satisfied for all K˜ ≥ 0. The remaining

    case isσ>σ ˆ and f(ˆc) >f(c)=fmin. Then inequality (2.26) holds for

    f(ˆc) − fmin K˜ ≤ min , (3.3) σˆ − σ

    where the minimum is taken over all Sˆ ∈Q={Sˆ∈S:ˆσ>σand f(c)

    σ f(ˆc) − fmin ² ≤ min , (3.4) |fmin| Sˆ∈Q σˆ − σ

    which concludes the proof. Lemma 3.3 shows that the strategy of DIRECT ensures that as soon as DIRECT finds a large hyperrectangle with lower function value at the center than all the func- tion values known so far, DIRECT does not further divide any smaller hyperrectangles. This means that DIRECT switches the focus of the search to a more promising area. Note that this lemma again does not depend on the characterization used.

    3.1.1 Changes to DIRECT by other authors

    This section shortly describes some other changes to DIRECT suggested by other au- thors. Jones [47] suggests to change the way a hyperrectangle is divided. Instead of trisecting a hyperrectangle along all longest sides, he suggests to trisect only one of them. This means that, independent of the problem dimension and the number Chapter 3. Extensions to the DIRECT algorithm 46

    of sides with longest side lengths, only two new points will be sampled whenever a hyperrectangle is divided. As a tie breaking mechanism, Jones keeps track of how often hyperrectangles have been divided along a given dimension i using a counter

    ti,i =1,...,N. If a hyperrectangle has several sides tied for being longest, the

    one with lowest ti gets divided. If several longest sides have the same ti, the tie is arbitrarily broken using the lowest index. Jones furthermore extends DIRECT to handle problems with nonlinear constraints and integer variables. The extension to handle nonlinear constraints is done by modifying the definition of potentially optimal hyperrectangles in a nontrivial way. To handle integer variables, Jones ensures that the points where the function is sampled have integer components where necessary. We did not implement these suggestions, since our problems did not contain known nonlinear constraints or integer components. Baker et al. [5, 4, 6, 73] suggest another modification to DIRECT, which they call aggressive DIRECT. Aggressive DIRECT divides the hyperrectangle with smallest function value for each measure, therefore discarding the idea of potentially optimal hyperrectangles. This increases the number of hyperrectangles to be divided, there- fore making the search more global. Since their focus is on achieving good parallel performance, this can be effective, especially for problems with many local and global minima.

    3.2 Stopping criteria

    As we described in section 2.4.3, Jones et al. [48] stop DIRECT once it has finished a given number of iterations. In the same article they report results of DIRECT for test problems. Since the global minimum for these test problems is known, they terminate DIRECT once the percent error in the function value is below a given tolerance. For this let fglobal be the known global minimal function value and denote by fmin the Chapter 3. Extensions to the DIRECT algorithm 47

    best function value found by DIRECT. We define the percent error p as   fmin−fglobal ,fglobal =06 , |fglobal| p = 100  fmin,fglobal =0.

    As mentioned in section 2.4.3, another termination criteria is to stop after a given number of function evaluations. This kind of termination is typical for many sampling algorithms. Jones [47] also describes this termination criteria. Cox et al. [19, 20] stop the original DIRECT once the size of the smallest hyper- rectangle reaches a certain percentage of the original hyperrectangle size. A similar stopping criteria is used by Evan Cramer [21]. To explain this termina- tion criteria, let n n n Smin = {S ∈S :f(c(S)) = fmin}.

    For any hyperrectangle S let mi(S) denote the length of the i-th side, i =1,...,N. The termination criteria used by Cramer stops DIRECT once

    n ∃S ∈Smin such that mi(S) ≤ ai, ∀i =1,...,N,

    where a ∈ RN is given by the user. Note that the termination criteria by Cox et al. is a special case of this criteria. Modifications of these criteria are to stop once

    n ∀S ∈Smin : mi(S) ≤ ai, ∀i =1,...,N,

    or n ∀S ∈Smin : σ(S) ≤ σ.

    n These two criteria stop DIRECT once all hyperrectangles with fmin at their center are small enough. Two additional termination criteria shift the focus away from the hyperrectangles Chapter 3. Extensions to the DIRECT algorithm 48

    Table 3.1: Summary of stopping criteria Number Termination criteria 1 Number of iterations 2 Number of function evaluations 3 Percent error p ≤ p˜,˜p≤1 given. n 4 ∃S ∈Smin : σ(S) ≤ σ,˜ σ˜ ≤ 1 given. n N 5 ∃S ∈Smin : mi(S) ≤ ai, ∀i =1,...,N, where a ∈ R is given. n 6 ∀S ∈Smin : σ(S) ≤ σ,˜ σ˜ ≤ 1 given. n N 7 ∀S ∈Smin : mi(S) ≤ ai, ∀i =1,...,N, where a ∈ R is given. 8 ∀B ∈Bn :σ(B)≤σ˜, whereσ ˜ ≤ 1 is given. n N 9 ∀B ∈B :mi(B)≤ai, ∀i=1,...,N, where a ∈ R is given.

    n in Smin and to the largest hyperrectangles. We denote the set of all largest hyperrect- angles Bn = {S ∈Sn :σ(S) = max σ(B)}. B∈Sn

    The idea is to stop DIRECT once either ∀B ∈Bn :σ(B)≤σ˜, whereσ ˜ ≤ 1is

    n N given, or ∀B ∈B :mi(B)≤ai, ∀i=1,...,N, where a ∈ R is given. This means that these two stopping criteria terminate DIRECT once the unexplored area is small enough. In Table 3.1 we summarize the different stopping criteria. Criteria 4 through 7

    terminate DIRECT once one or all hyperrectangles with function value fmin at their center are small. The last two (8 and 9) stop DIRECT once the largest unexplored area is small enough. We come back to the different stopping criteria once more in Chapter 5, when we describe our numerical results.

    3.3 Choice of ²

    In the definition of potentially optimal hyperrectangles the value of ² is used to ensure that there is the potential for a sufficient decrease in this hyperrectangle. In the original description of DIRECT by Jones et al. [48] they report that values of Chapter 3. Extensions to the DIRECT algorithm 49

    ² ∈ [10−7, 10−3] work well. In a newer paper by Jones [47] he advises to use the following strategy to choose ²:

    −4 −8 ² = max(10 |fmin|, 10 ). (3.5)

    We report results for fixed ² in Chapter 5. We did not observe significant improvement through the use of this new strategy.

    3.4 Extensions to problems with hidden constraints

    In this section we describe how we extended DIRECT to handle problems with hidden constraints. That means we look at Problems P00 where the subset B of Ω over which the function is defined is not given analytically. Jones [47] describes an extension of DIRECT where the subset B is given by known inequalities. To do this he modifies the definition of potentially optimal hyperrectangles. He furthermore allows some of the variables to be integer. We did not use these ideas since in our application B is not given analytically. Instead we implemented three different ways to handle hidden constraints, which we describe below. We also ensure that DIRECT does not stop without finding at least one feasible point. If no feasible point is found when the termination criteria is satisfied, we allow DIRECT to continue until it finds a feasible point. We then reassign the original budget. Note that this only works if the set B has a non-empty interior.

    3.4.1 Replace the function value at an infeasible point by a high value

    The first and easiest method to handle infeasible points is to assign a high value (we used 106) to these points. This ensures that no hyperrectangle with infeasible Chapter 3. Extensions to the DIRECT algorithm 50 midpoint can be potentially optimal as long as hyperrectangles of the same level and stage (but feasible midpoint) exist, see lemma 3.2. Through this choice DIRECT will reexamine areas where it has already found a feasible point before considering hyperrectangles with infeasible center for further division. Note that when we use this strategy we have to make sure that the high value we use to mark infeasible points is larger than the maximum value attained by the function. This may not be possible in a real application. One idea to avoid this problem was suggested to us by Richard Carter [11]. He suggested to use the maximum function value found so far instead of a predetermined high value. This strategy alone has two problems. The first one is that through this a hyperrectangle with infeasible midpoint could be divided before one with a feasible midpoint. This can be avoided by adding a positive constant (for example 1) to the maximum value before assigning it to an infeasible hyperrectangle . The second, more serious problem, is that the maximum function value found can change from iteration to iteration. If this happens we would need to replace the value assigned to infeasible points earlier by this new value. This can be done without extra work when we use this strategy as part of the strategy described in section 3.4.3.

    3.4.2 Create an extra list of infeasible points

    The second strategy we looked at consisted of keeping an extra list of hyperrectangles with infeasible centers. In every iteration DIRECT divides one of these hyperrectan- gles. This strategy has shown to be very ineffective in our tests, so we do not suggest it. Chapter 3. Extensions to the DIRECT algorithm 51

    3.4.3 Replace the function value at an infeasible point by the value of a nearby feasible point

    The last strategy we looked at was also suggested by R. Carter [11]. We describe the general idea before going into details. For any infeasible midpoint, we expand its hyperrectangle by a factor of 2. If this larger hyperrectangle contains one or more feasible midpoints of other hyperrectan- gles, find the smallest function value of these, F . Then use F + ²|F | as a surrogate value. If no feasible midpoint is contained in the larger hyperrectangle, mark the current point as really infeasible.

    Algorithm 3.1 DIRECT(a, b, f, ², numit, numfunc)

    1: Normalize the search space to be the unit hypercube with center point c1 2: Evaluate f(c1),fmin = f(c1),t=0,m=1 3: while t < numit and m < numfunc do 4: Identify the set S of potentially optimal hyperrectangles 5: while S =6 ∅ do 6: Take j ∈ S 7: Sample new points, evaluate f at the new points and divide the hyper- rectangle with Divide 8: Update fmin,m=m+∆m 9: Set S = S \{j} 10: end while 11: Use ReplaceInf to check for infeasible points which are near feasible points and to replace the value at these by the value of a nearby point. 12: t = t +1 13: end while

    We will now describe this strategy in more details. We extend DIRECT as shown in Algorithm 3.1 by adding a call to ReplaceInf in line 11. In this method the actual replacement takes place. In method ReplaceInf, shown in Algorithm 3.2, we iterate over all hyperrect- angles with infeasible midpoints. For each of these midpoints, we create a new sur- rounding box by doubling the length of each side while keeping the same center. Then we find F , which is the minimum value of all feasible points found by DIRECT inside Chapter 3. Extensions to the DIRECT algorithm 52

    Algorithm 3.2 ReplaceInf({ci}, {li}, {fi}) Input : N •{ci}- centers of hyperrectangles created by DIRECT, ci ∈ R . N •{li}- side lengths of the hyperrectangles created by DIRECT, li ∈ R . •{fi}- function values at centers of hyperrectangles created by DIRECT, fi ∈ R. Output : •{fi}- updated function values.

    1: for all ci infeasible do 2: Create larger hyperrectangle D around ci. 3: { ∞} F = min mincj ∈D fj, 4: if F<∞ then −6 5: fi = F +10 |F| 6: else 7: mark fi really infeasible. 8: end if 9: end for

    this expanded hyperrectangle. If this minimum exists (that is, there is at least one feasible point in the larger hyperrectangle) we assign F +²|F | to the current infeasible point. We used a value of ² =10−6 in our computations. Otherwise the infeasible point is marked really infeasible. For these really infeasible points we use the modified strategy from 3.4.1. That means we assign the maximum value found so far, increased by 1, to these points. Since we have to check each infeasible point in ReplaceInf there is no extra cost if the maximum increases. We increase the replacement value to make sure that if there is another hyper- rectangle with feasible midpoint of the same level and stage, and the same function value (this could be one we used to calculate the minimum), the one with feasible midpoint is divided first. Figure 3.3 shows an example of this strategy. We show a close-up of the area around an infeasible point P and its corresponding hyperrectangle. The dotted rect- angle is the enlarged rectangle around P . There are nine midpoints of rectangles DIRECT has created contained in this enlarged rectangle; three of them are infeasible. Chapter 3. Extensions to the DIRECT algorithm 53

    P Inf 14

    Inf Inf 12 20

    18 19

    Inf 10 22

    Figure 3.3: Example of an infeasible point and how its surrogate value is calculated.

    Note that we look at the closed rectangle, therefore the points on the boundary are also considered nearby. We then take the minimum over the six feasible midpoints. This value is 10; therefore we take 10 + ²|10| as the new value at P . If we would not amplify the surrogate value, the rectangle with center P would look (to DIRECT) the same as the rectangle with a function value of 10 at the midpoint. We would then have to ensure that DIRECT chooses the rectangle with feasible midpoint and not the one with midpoint P . We avoid this problem by using the amplified value. Note that this surrogate value at the midpoint can change in each outer iteration. There are two reasons why this could happen. • A new point inside the box with a lower function value than the one assigned so far has been found. • The hyperrectangle corresponding to the infeasible point was divided by DIRECT. Through this division DIRECT has made the hyperrectangle smaller and no more feasible point is nearby, that is in the area looked at. Chapter 3. Extensions to the DIRECT algorithm 54

    16 16

    Inf Inf Inf P P Inf Inf Inf 15 18 Inf Inf Inf 15 18

    Inf Inf Inf

    13 10 12 13 10 12

    14 15 16 14 15 16

    Figure 3.4: Example of an infeasible point, whose value was replaced by the value of a feasible point nearby, becoming completely infeasible.

    Figure 3.4 shows an example of this. On the left, we show a close-up of the hyperrectangles and midpoints DIRECT had created before the call to ReplaceInf. We show the enlarged rectangle around P . There are eight points inside this enlarged rectangle (including the boundary). Five of these are feasible points; therefore, we assign the surrogate value of 10 + ²|10| to the point P . On the right of Figure 3.4 we show the same area after the division of the rectangle with midpoint P by DIRECT. The two new rectangles have infeasible midpoints. This time the enlarged rectangle around P does not contain any feasible points, at least DIRECT has not found any. Therefore P is now marked as really infeasible.

    3.5 Use of DIRECT as a starting point generator

    Through its good global behavior the DIRECT algorithm offers itself as a starting point generator for other, more conventional, algorithms. The algorithm is slow in the final phase of an optimization. This is a common behavior of Lipschitz opti- mization algorithms, as was pointed out by Hansen, et al. [41]. They do calculations Chapter 3. Extensions to the DIRECT algorithm 55

    with a known Lipschitz constant. Their remedy for the problem of slow local/final convergence, as described in [39] and [41], was a two-phase algorithm composed of two different kinds of Lipschitz optimization algorithms. In many of our problems [7, 15, 16, 23] we have seen that the function behaves nicely near the optimum. By this we mean that at least near the optimum the function is composed of a smooth function and a small noise function. This smooth function near the optimum could behave, for example, like a quadratic function or some other function of a simple form. This is the set-up of the Implicit Filtering method, as described in appendix B, and [17, 35], from where we took the following description. We used our implementation of Implicit Filtering, IFFCO [17], for the numerical results. Near the optimum, the function fˆ to be minimized is assumed to be of the form

    fˆ(x)=f(x)+φ(x). (3.6)

    In Equation (3.6), f is of simple form and φ is a low-amplitude, high-frequency perturbation, referred to as “noise” in [17, 35]. Since Implicit Filtering is designed for exactly these kind of functions, especially when the amplitude of the noise decays near local minima, we want to find the areas near global minima with DIRECT and then start Implicit Filtering there. In our new strategy DIRECT-IFFCO we therefore allow DIRECT to use a small number of function evaluations and then use the the best point found by DIRECT as the starting point for IFFCO. Cox et al. [19, 20] also use DIRECT as a starting point generator. They stop

    DIRECT once one of the hyperrectangles with f(c(S)) = fmin has a given small size, as we already described in Section 3.2. They use a commercial program named Design Optimization Tool [72] for the local search. This software uses a Sequential Quadratic Programming approach. Jones [47] describes another way to use DIRECT as a starting point generator. He Chapter 3. Extensions to the DIRECT algorithm 56

    also suggests to use a small initial budget of function evaluations for DIRECT and then use a local optimizer. After convergence of the local optimizer he switches back to DIRECT. By using the best function value found so far (by either DIRECT or the local optimizer), DIRECT will search more globally. This is so since the value of fmin affects which hyperrectangles are potentially optimal. If DIRECT finds a better point, he again switches to the local optimizer, and so on. Nelson et al. [58] go one step further by suggesting to combine DIRECT with a traditional Quasi-Newton Trust Region algorithm in the following way. In every iter- ation, the potentially optimal hyperrectangle with smallest function value is identified.

    The midpoint of this hyperrectangle xbst is then updated with a single quasi-Newton + + step s resulting in a new point x = xbst + s. The function is evaluated at x . Then the hyperrectangle containing x+ is identified and divided into two hyperrectangles such that x+ is as close to the midpoint of one of these hyperrectangles as possible. All other potentially optimal hyperrectangles are divided by DIRECT. This method combines a local search algorithm with DIRECT without switching between the two methods. However, it complicates the implementation of DIRECT substantially, since the hyperrectangles created will not have side lengths that are an integer power of 1/3. Chapter 4

    Analysis of DIRECT

    In this chapter we analyze the DIRECT algorithm. We do this by first showing that DIRECT eventually solves problem P0. In the next section we describe the clustering DIRECT creates. Finally we look at iteration estimates for both the original DIRECT algorithm and our modification DIRECT-l.

    4.1 Convergence Analysis

    In this section we show, following Jones et al. [48], that eventually DIRECT will sample arbitrarily near every point in Ω, and therefore solve problem P0. First observe that definition 2.3 of potentially optimal hyperrectangles ensures that DIRECT can not terminate prematurely, as we show in the following lemma. It cannot happen that no potentially optimal hyperrectangle exists; DIRECT will always divide at least one hyperrectangle in every iteration. Lemma 4.1. There is at least one potentially optimal hyperrectangle in every iteration. This hyperrectangle has the largest measure of all hyperrectangles in this iteration. Proof. Let S be the set of hyperrectangles DIRECT has created in a given iteration. ∈S ˆ ≤ ˆ ∀ ˆ ∈C {ˆ∈ Let S be such that σ(S) = maxSˆ∈S σ(S) and f(c(S)) f(c(S)), S = S S:σ(Sˆ)=σ(S)}.

    57 Chapter 4. Analysis of DIRECT 58

    Then the index sets I1,I2 and I3 in inequality (2.29) in lemma 2.4 are given by the following relations. I1 is the index set of the hyperrectangles in D = {Sˆ ∈S:σ(Sˆ)<

    σ(S)}, I2 is empty, and I3 is the index set of the hyperrectangles in C. Therefore using inequality (2.29) in lemma 2.4 we get

    f(c(S)) − f(c(Sˆ)) max ≤ K.˜ (4.1) Sˆ∈D σ(S) − σ(Sˆ)

    Inequality (4.1) means that there is only a lower bound on K˜ , but no upper bound. Therefore we can choose K˜ as large as we want. This means that condition (2.30) in lemma 2.4 can always be satisfied. Therefore the hyperrectangle S is potentially optimal. This lemma shows that, independent of the characterization used, DIRECT will not terminate prematurely. Instead it will continue until it satisfies a convergence criteria. Using this lemma, we show in corollary 4.1 that eventually every hyperrectangle gets divided. Corollary 4.1. Every hyperrectangle gets divided after a finite number of iter- ations. Proof. Let Sn be the set of hyperrectangles DIRECT has created in iteration n. Fix any S ∈Sn, let σ = σ(S), and let Dm = {S˜ ∈S:σ(S˜)≥σ},m ≥ n. From lemma 4.1 it follows that in every iteration at least one of the hyperrectangles in Dm gets divided. Since there is only a finite number of possible hyperrectangles with measure greater or equal to σ, after a finite number of iterations Dm = ∅. This shows that the hyperrectangle S gets divided after a finite number of iterations. Jones et al. [48] show that eventually the search will sample arbitrarily near every point in Ω. They use the fact that whenever DIRECT divides a hyperrectangle, the volume of this hyperrectangle decreases. Then using the facts that DIRECT always divides at least one hyperrectangle in every iteration (see lemma 4.1), and every Chapter 4. Analysis of DIRECT 59

    hyperrectangle gets divided after a finite number of iterations (see corollary 4.1), DIRECT will eventually solve problem P 0.

    4.2 Local clustering

    In this section we come back to lemma 3.3. We show that DIRECT’s subdivisions become refined near global minimizers. To see this, let Sn denote the state of DIRECT after n iteration. We can describe the progress in the optimization in terms of the minimum value of f in the nth iteration

    n fmin = min f(c(S)). S∈Sn

    This minimum is attained on the non-empty set

    n n n Smin = {S ∈S :f(c(S)) = fmin}.

    Let v = max σ(S), (4.2) n n S∈Smin and n n T = {S ∈Smin : σ(S)=vn}. (4.3)

    Note that there is a direct relation between the volume of a hyperrectangle and its level and stage. Let S be a hyperrectangle with level l and stage p. Then

    volume(S)=(3−l)(N−p)(3−l−1)p =3−(lN+p).

    Note that the negative exponent is the integer we used for the representation of the l2 characterization in section 3.1. Using lemma 3.3, we know that all hyperrectangles in T n are potentially optimal Chapter 4. Analysis of DIRECT 60

    if ² is small enough. We can now make the following observation. In every iteration of DIRECT at least one of the following three things happens. Either the best function n+1 value decreases, vn+1

    σ(S) f(c(Sˆ)) − fmin ² ≤ min , ∀S ∈Tn |fmin| Sˆ∈Qn σ(Sˆ) − σ(S)

    n n n where Q = {Sˆ ∈S :f(c(Sˆ)>fmin,σ(Sˆ)>σ(S)}. Then for all n every S ∈T is potentially optimal and at least one of

    n+1 n fmin

    vn+1 ≤ vn/3, (4.5) or

    n+1 n |Smin | > |Smin| (4.6)

    holds.

    n Proof. Let S ∈T . Lemma 3.3 implies that S is potentially optimal. vn+1 >vn/3 n+1 implies that fmin is attained in a hyperrectangle that is not the result of a subdivision n n+1 n of any member of T . Therefore either fmin

    We can make a similar observation for our modified version of DIRECT, DIRECT-l. Since we only divide one potentially optimal hyperrectangle of every level, we need to add one more statement. Note that we can still make the same observation for DIRECT-las above for the unique global minimum case. Corollary 4.2. Let equation (3.1) hold for every S ∈Tn. Then in every iteration of DIRECT-l either

    n+1 n fmin

    vn+1 ≤ vn/3, (4.8)

    n+1 n |Smin | > |Smin|, (4.9)

    or

    T n+1 ⊂Tn and |T n+1| = |T n|−1 (4.10)

    holds. Proof. DIRECT-l selects only one hyperrectangle S ∈Tn for subdivision. If T n has more than one element, only one will be divided. Equation (4.10) describes the case when after the division of S the other elements of T n are still potentially optimal. The rest of the proof is the same as for Theorem 4.1.

    4.3 Iteration estimates

    In this section we give estimates on the number of iterations and the number of function evaluations needed to reach certain results. We start by looking at some worst cases, and then look at special types of functions. Chapter 4. Analysis of DIRECT 62

    Figure 4.1: Points at which the function is evaluated when DIRECT divides a three-dimensional cube.

    Figure 4.2: Hyperrectangles created when DIRECT divides a three-dimensional cube.

    4.3.1 Worst case analysis

    We first want to take a look at the division of an N-dimensional hypercube of level k. In this division DIRECT does 2N function evaluations, two in each coordinate axis. Figure 4.1 shows an example of this for a three-dimensional cube. Here we represent the points at which the function is evaluated by dots. Figure 4.2 shows a three-dimensional cube and how DIRECT divides it. On the left we show the undivided cube. In the middle and on the right we show the cubes and hyperrectangles DIRECT creates when it divides the original cube. Without loss of generality, we look at the unit cube. DIRECT creates three cubes with a side length of 1/3, and four hyperrectangles. Two of these hyperrectangles have one side with a Chapter 4. Analysis of DIRECT 63

    Table 4.1: The resulting hyperrectangles after the division of an N-dimensional hypercube with side length 1/3k. Type Quantity Sides with a length of Level Stage 1/3k+1 1/3k hypercube 3 N 0 k +1 0 hyperrectangle 2 N − 1 1 k N − 1 hyperrectangle 2 N − 2 2 k N − 2 ...... hyperrectangle 2 2 N − 2 k 2 hyperrectangle 2 1 N − 1 k 1 Total 2(N − 1)

    length of 1 and two sides with a length of 1/3. The remaining two hyperrectangles have two sides with a length of 1 and one side with a length of 1/3 (these are the larger hyperrectangles in the picture). Table 4.1 shows how many hyperrectangles of what size DIRECT creates when it divides a hypercube with a side length of 1/3k. We will use this table for our iteration estimates below. In the following we want to find an upper bound on the number of iterations DIRECT needs to divide the unit hypercube into hyperrectangles of level greater or equal to k + 1. That is, we want to know after how many iterations we can be sure that no hyperrectangles with level smaller or equal to k are left undivided. To maximize the number of iterations, we assume that DIRECT only divides one hyperrectangle in each iteration; that is, we assume that only one hyperrectangle is potentially optimal in each iteration. Another information we need in the following is the number of hypercubes of a certain level and stage 0 we need to cover the unit hypercube. Figure 4.3 shows two examples for the two-dimensional case. On the left we show the nine squares of level 1 needed to cover the unit square. On the right we show the 27 squares of level 2 to do the same thing. Chapter 4. Analysis of DIRECT 64

    Figure 4.3: Examples of squares of level 1 and 2 needed to cover the unit square.

    Table 4.2: Number of hypercubes (stage 0) of side length 1/3k needed to cover the n-dimensional hypercube of length 1. level side length Number of cubes k =0 1 1 1 N k=1 3 3 1 2N k=2 9 3 1 3N k=3 27 3 . . 1 kN k 3k 3 Chapter 4. Analysis of DIRECT 65

    1

    2

    3

    4

    Figure 4.4: First four divisions of the interval [0, 1].

    Table 4.2 gives the numbers of hypercubes needed to cover the unit hypercube for some side lengths and the general formula. We now come back to the problem of finding the upper bound on the number of iterations. We first take a look at the one- and two-dimensional cases. Using the insights we get there, we can then give a formula for the N-dimensional case. Figure 4.4 shows an example of the first four iterations of DIRECT when DIRECT divides exactly one hyperrectangle in every iteration. The one-dimensional unit hy- percube is the interval [0, 1]. After the first iteration no intervals of level 0 are left. After the fourth iteration no intervals of level 0 or 1 are left. Generally the number of intervals of level k is equal to the maximum number of iterations needed to divide (or branch) all intervals of level k. This is clear since, by assumption, we only divide one interval in each iteration. The total number p1(k) of iterations needed from initialization to the first time that no interval of level k is left undivided is therefore given by the following formula: Xk ¡ ¢ j 1 (k+1) p1(k)= 3 = 3 − 1 . (4.11) 2 j=0 Next we look at the two-dimensional case. Figure 4.5 shows the first 3 iterations of DIRECT in the two-dimensional case when we assume that DIRECT divides exactly one hyperrectangle in every iteration. It takes 3 iterations until no more rectangles Chapter 4. Analysis of DIRECT 66

    123

    Figure 4.5: First three divisions of the square [0, 1]2.

    with a level of 0 are left undivided. After these three iterations there are 9 = 3N squares of level 1 left. To divide one of these squares again 3 iterations are needed. Therefore, to divide all of them, 3 times 9 iterations are needed. This means we need

    3+3·9=3(1+32)=30 iterations until all rectangles with level 0 or 1 are divided. As shown in Table 4.2, the original square is now divided into 32N = 81 squares of level 2. To divide these 81 squares, again 3 iterations per square are needed, totaling 243 extra iterations. So after 3+3·9+3·81 = 3(1 + 32 +(32)2) = 273

    iterations no rectangles with level smaller or equal 2 are left. Using this observation, we get the following formula to calculate the number of iterations DIRECT needs until no rectangle with a level smaller or equal to k is left: Ã ! Xk ¡ ¢ 2 j 3 (k+1) p2(k)=3 (3 ) = 9 − 1 . (4.12) 8 j=0

    We can extend this formula to the N-dimensional case. Theorem 4.2. Let C be the N-dimensional unit hypercube. Assume DIRECT

    c divides exactly one hyperrectangle in every iteration. Then after exactly pN (k) iter- ations of DIRECT no hyperrectangles of level k are left undivided, that is in iteration Chapter 4. Analysis of DIRECT 67

    c − c pN (k) 1 the last hyperrectangle of level k is divided. pN (k) is given by µ ¶ 3N(k+1) − 1 pc (k)=3N−1 . (4.13) N 3N − 1

    To prove Theorem 4.2, we need the following lemma. Lemma 4.2. Let C be an N -dimensional hypercube of level k. Assume DIRECT divides exactly one hyperrectangle in every iteration. Then DIRECT needs 3N −1 iter- ation to branch C into 3N hypercubes of level k +1. Proof. (of Lemma 4.2) Without loss of generality let k = 0; that is, C is the unit hypercube. Let q(N) be the number of iterations needed to branch the N-dimensional unit hypercube C into N-dimensional hypercubes of level 1 under the assumption. Let K be an N-dimensional hyperrectangle of stage d, that is with d sides of length 1 and (N − d) sides with a length of 1/3. When DIRECT divides K, DIRECT does the division without affecting the (N − d) sides, which already have a side length of 1/3. Therefore DIRECT needs the same number of divisions to divide K into hypercubes of level 1 as if K would be a d-dimensional hypercube, that is, q(d) iterations. Using this observation, we can prove this lemma using induction. Let N = 1. Then, as was pointed out earlier in this section, after the first iteration no intervals of level 0 are left. This is consistent with the number given by the lemma; that is, q(1) = 31−1 =30 =1.

    Let q(k)=3k−1 for all k =1,...,N. We now look at the division of the N +1- dimensional hypercube. As described earlier (see Table 4.1) after the first division there exist 3 hypercubes which do not need to be further divided. Furthermore, there exist pairs of hyperrectangles with 1, 2,...,N sides of length 1. By induction assumption we know that to divide the two hyperrectangles with only one side of length 1, we need 2q(1) iterations. For the two hyperrectangles with 2 sides of length Chapter 4. Analysis of DIRECT 68

    1, we need 2q(2) iterations and so on. Adding all these iterations, we get the following formula (counting the first iteration to divide the original hypercube):

    XN q(N +1) = 1+ 2q(l) l=1 XN =1+2 3l−1 l=1 NX−1 =1+2 3l l=0 3N−1 =1+2 3−1 3N−1 =1+2 2 =3N.

    This finishes the induction. We can now prove Theorem 4.2. Proof. (of Theorem 4.2) We prove this theorem by induction over k. Let k = 0. This is the situation of Lemma 4.2. Therefore, we get µ ¶ µ ¶ 3N −1 3N(0+1) − 1 pc (0) = q(N)=3N−1 =3N−1 =3N−1 . N 3N −1 3N − 1

    Let the formula be valid for s =1,...,k. Assume that the unit hypercube has ¡ ¢ been divided such that no hypercubes with a side length of 1 k are left. This means ¡ ¢ 3 1 k+1 that all hypercubes have a side length of 3 . By the induction assumption, c DIRECT needed pN (k) iterations to create this division. As shown in Table 4.2, there ¡ ¢ +1 (k+1)N 1 k are 3 hypercubes with side lengths 3 . Using Lemma 4.2, we know that DIRECT needs 3N−1 iterations to divide each of these hypercubes. Therefore, the Chapter 4. Analysis of DIRECT 69

    Table 4.3: Maximum number of iterations needed to branch all hypercubes of level k in dimension N. k \ N 12 3 4 5 0 1392781 14 30 252 2,214 19,764 2 13 273 6,813 179,361 4,802,733 3 40 2,460 183,960 14,528,268 4 121 22,143 4,966,929

    number of iterations needed is

    c c (k+1)N N−1 pN (k +1) = pN(k)+3µ (3 ¶ ) 3N(k+1) − 1 =3N−1 +(3N−1)3(k+1)N 3N − 1 µ ¶ 3N(k+1) − 1+3(k+1)N 3N − 3(k+1)N =3N−1 3N − 1 µ ¶ 3N(k+2) − 1 =3N−1 . 3N − 1

    This completes the induction.

    c Table 4.3 shows the values of pN (k) for some different dimensions and levels. In the following theorem we look at the other extreme case for DIRECT-l. Here we assume that DIRECT-l divides one hyperrectangle of each level, therefore dividing as many hyperrectangles as possible in each iteration. Theorem 4.3. Let C be the N-dimensional unit hypercube and assume that DIRECT-l divides one hyperrectangle of each level in every iteration. Then after a a pN (k) iterations no hyperrectangles of level k are left undivided, where pN (k) is given by a N−1 kN pN (k)=k+3 3 . (4.14) Chapter 4. Analysis of DIRECT 70

    Proof. When DIRECT divides a hyperrectangle of level k − 1 it creates 3 hy- percubes of level k. Therefore from the first iteration in which DIRECT-l creates a hyperrectangle of level k until the iteration in which DIRECT-l divides the last remaining hyperrectangle of level k, there is always at least one hyperrectangle of level k. Therefore it cannot happen that, for example, in iteration i hyperrectangles of level 1 and 3 exist, but no hyperrectangle of level 2. If in iteration j the lowest level of hyperrectangles is l and the largest is m, there exist hyperrectangles of level i = l,...,m. Using lemma 4.2 we know that 3N−13kn iterations after the first hypercube of level k is created, DIRECT-l has divided all 3kN hypercubes of level k needed to cover C. From the assumption and the discussion above it follows that DIRECT creates the first hyperrectangle of level k in iteration k. Therefore after exactly

    a kN N−1 pN (k)=k+3 3

    iterations no hyperrectangles of level k are left undivided. Taking the estimates from theorem 4.2 and 4.3 together, we can bound the number of iterations DIRECT-l does from below and above.

    Corollary 4.3. The number of iterations pN (k) of DIRECT-l after which no hyperrectangle of level k is left undivided is bounded from above and below by µ ¶ 3N(k+1) − 1 k +3N−13kN = pa (k) ≤ p (k) ≤ pc (k)=3N−1 . N N N 3N − 1

    4.3.2 Constant functions

    There actually is a type of function that forces our implementation of DIRECT, DIRECT-l, to divide exactly one hyperrectangle in every iteration, as we show in lemma 4.3. Chapter 4. Analysis of DIRECT 71

    Lemma 4.3. If f is a constant function, DIRECT-l divides exactly one hyperrect-

    c angle in every iteration. Therefore after pN (k) iterations no hyperrectangles of level smaller or equal to k are left undivided. Proof. Let f(x)=Q∈R, ∀x∈RN. Then inequality (2.26) in definition 2.3 can be written as

    Q − Kσ˜ j ≤ Q − Kσ˜ i (4.15)

    0 ≤ K˜ (σj − σi) (4.16)

    Therefore only a hyperrectangle with longest maximal side length can be potentially optimal. Using lemma 4.1 we know that there exists such a hyperrectangle that is potentially optimal. Since we divide in our implementation at most one hyper- rectangle of each level, DIRECT-l divides in every iteration only one hyperrectangle.

    c This means we satisfy all conditions of theorem 4.2. So after pN (k) iterations no hyperrectangles of level smaller or equal to k are left undivided. Now we look at the iterations DIRECT does when f is constant. This case is more complicated than with DIRECT-l, since in every iteration many hyperrectangles will be potentially optimal. Lemma 4.4. If f is constant, no hyperrectangles of level k and stage p are left undivided by DIRECT after q(k, p) iterations, where

    q(k, p)=Nk +p+1.

    Proof. The main idea of the proof is that DIRECT divides in every iteration all hyperrectangles with maximum measure, and no others. Therefore the size of the largest hyperrectangles is reduced in every iteration. Let Sn be the set of all hyperrectangles in iteration n. Chapter 4. Analysis of DIRECT 72

    Since f is constant, all hyperrectangles have the same function value at their cen- ters. Lemma 3.3 then implies that only hyperrectangles with maximum measure are ˆ potentially optimal, that is all hyperrectangles S with measure σ2(S) = maxSˆ∈Sn σ(S) are potentially optimal. Note that for every n ∈ N ∪{0} there exist unique k ≥ 0, 0 ≤ p ≤ N − 1 such that n = kN + p. We now show by induction over n that in iteration n +1=kN + p +1 all hyperrectangles of level k and stage p are potentially optimal and therefore will be divided. Therefore after n +1=q(k, p) iterations no hyperrectangles of level k and stage p are left undivided. Let n =0. This implies that k = 0 and p = 0. By definition the only hyperrectangle of level 0 and stage 0 is the original hypercube. The original hypercube is potentially optimal in the first iteration and therefore gets divided. Therefore after the first iteration no hyperrectangles of level 0 and stage 0 are left undivided, that is q(0, 0) = 1. Let the formula be valid for n ≥ 0. This means that in iteration q(k, p) no hyperrectangles of level k and stage p are left undivided, where n = kN + p, k ≥ 0 and 0 ≤ p ≤ N − 1. We need to distinguish two cases, 0 ≤ p

    and therefore the formula is valid for n + 1 if 0 ≤ p

    4.3.3 Linear functions

    In the following we want to look at linear functions.

    N Let Ω = [0, 1] , 0

    XN f(x)= aixi, ∀x ∈ Ω. (4.17) i=1

    We make this assumption on the coefficients to ensure that the function has a unique minimum in the origin. This ordering also tells us how DIRECT will divide a hypercube. For example, in the two-dimensional case, as shown in Figure 4.6, we know that 1 0 3 2 f(x ) 0. Since a1 > 0 we also know that f(x ) < 0 4 1 2 1 f(x )

    x3

    x24xx0

    x1

    (0,0)

    Figure 4.6: The points at which DIRECT samples the function when dividing a square.

    x3 x3

    xx240 x xx240 x

    x1 x1

    (0,0) (0,0)

    Figure 4.7: The division DIRECT Figure 4.8: The division DIRECT creates when 0

    (0,0) (0,0) (0,0)

    Figure 4.9: The first three iterations of DIRECT for 0

    (0,0) (0,0) (0,0)

    Figure 4.10: The first three iterations of DIRECT for a1 >a2 >0.

    i mini=0,1,2,3,4 f(x ). Therefore DIRECT will divide the hypercube as shown in Figure

    4.7. If, on the other side, 0

    ferent sets of a1 and a2. Note that these divisions are similar to each other. The number of function evaluations and the difference between the function value of the

    (0,0) (0,0) (0,0)

    Figure 4.11: The first three iterations of DIRECT for a1

    best point found and the minimal function value in each iteration are the same. Therefore we will only look at the case when coefficients of the linear function satisfy

    0

    1 x ≥ ,i=1,...,N, i 2 · 3k+1

    and there exists at least one coordinate with

    1 x ≥ . j 2 · 3k

    Proof. The proof follows directly from the way DIRECT divides hyperrectangles. A hyperrectangle of level k has at least one side with a length of 1/3k. All other sides must have a length of 1/3k or 1/3k+1. Using this lemma, we can look at the special case when we set ² = 0 in the definition of potentially optimal hyperrectangles.

    N Theorem 4.4. Let Ω=[0,1] , 0

    the value of fmin is given by à ! NX−p XN ( ) 1 f s = 3 a + a . (4.19) min 2 · 3k+1 i i i=1 i=N−p+1 Chapter 4. Analysis of DIRECT 77

    The corresponding midpoint c(s) is      3      .  −  .   N p       (s) 1  3   c =   , (4.20) 2 · 3k+1     1    .   .  p     1 where the first N − p entries are 3, and the remaining p entries are 1. The unique hyperrectangle with midpoint c(s) has level k and stage p for 1 ≤ p ≤ N − 1 and level k +1 and stage 0 for p = N. Specifically, for 1 ≤ p ≤ N − 1, the N − p sides parallel to the first N − p coordinate axes have a length of 1/3k. The other p sides have a length of 1/3k+1.Forp=N, the hyperrectangle with center c(s) is an N-dimensional hypercube with side length 1/3k+1. Proof. We will do this proof as two inductions, one over k (the outer induction), and one over p (the inner induction). Let k =0. Let p =1. This is the first iteration. By definition DIRECT samples the function at the points        3   3   1         .   .   .   .   .   .               3   3   1  1   1  1   i   ← i   ← 0   x =  1  i ,y =  5  i ,i=1,...,N, and x =  1  . 6   6  2          3   3   1         .   .   .   .   .   .  3 3 1

    i 0 Here 1 and 5 are the i-th entries. Since ai > 0, ∀i we have that f(x )

    i f(y ). Since a1 3 a − 2a = f(xj), 1 ≤ i

    1 2 N (1) N This implies that f(x ) >f(x)>··· >f(x ). Therefore fmin = f(x )= ³ P ´ 1 N−1 (1) N 6 3 l=1 al + aN and c = x . The way DIRECT divides a hypercube, we know that the hyperrectangle with center c(1) has N −1 sides with a length of 1, and one side with a length of 1/3. The side with a length of 1/3 is parallel to the N-th coordinate axis. Therefore the level of this hyperrectangle is 0, the largest level possible. Its stage is 1, since it has only one short side. Let the assumption be true for p

    (p) (p) By assumption f(c )=fmin. Furthermore the N − p sides parallel to the first N − p coordinate axes have a length of 1, and the other p sides have a length of 1/3. Therefore this hyperrectangle has level 0 and stage p, the smallest level and stage. Using lemma 3.3 we know that this hyperrectangle is potentially optimal and therefore DIRECT will divide it in iteration p +1. Since DIRECT divides a hyperrectangle only along its longest sides, the last p coordinates of all new sample points will be exactly the same as the last p coordinates of c(p). Therefore, to DIRECT, the hyperrectangle looks like a (N − p)-dimensional Chapter 4. Analysis of DIRECT 79

    hypercube. DIRECT will sample the function at      3   3       .   .   .   .           3   3        ←   ←  1  i  5  i         i 1  3  i 1  3  x =   , y =   ,i=1,...,N −p. 6  .  6  .   .   .           3    3              1    1    .   .   .  p  .  p         1 1

    Here the last p entries of xi and yi are 1. The i-th entry of xi is 1, while the i-th entry of yi is 5. As before we have f(xi) f(x2)> ³ P P ´ ··· N−p (p+1) N−p 1 N−(p+1) N >f(x ). Therefore fmin = f(x )= 6 3 l=1 al + l=N−(p+1)+1 al and c(p+1) = xN−p. Note that for p +1=N the hyperrectangle with center c(p+1) is a hypercube of level 1 and stage 0. For all other p, the hyperrectangle with center c(p+1) has level 0. The first N − (p + 1) sides parallel to the first N − (p + 1) axes have a length of 1, while the other p + 1 sides have a length of 1/3. Therefore the hyperrectangle with center c(p+1) has stage p +1. Let the assumption be true for k ≥ 0. Let s = kN + N. Let p = 1. This means we look at iteration s +1=(k+1)N+ 1. By assumption,

    (s) (s) (s) f(c )=fmin and the hyperrectangle with center c is a hypercube of level k +1 and stage 0. Using lemma 3.3, the hypercube with center c(s) is potentially optimal in iteration s + 1, since

    d f(c ) − fmin 0=²≤ j min i . | | ∈ − |f{zmin} i I2 | di {zdj } >0 >0 Chapter 4. Analysis of DIRECT 80

    Therefore DIRECT will divide this hypercube in iteration s + 1. During this divi- sion, DIRECT will sample the function at      3   3       .   .   .   .           3   3  1   1   xˆi =   ← , yˆi =   ← ,i=1,...,N, · k+2  1  i · k+2  5  i 2 3   2 3        3   3       .   .   .   .  3 3

    i i where the i-th entries are 1 and 5. From ai > 0, ∀i it follows that f(ˆx ) f(ˆx ) > ··· >f(ˆx ). Therefore ³ P ´ (s) N 1 N−1 (s+1) N fmin = f(ˆx )= 2·3k+2 3 l=1 al + aN and c =ˆx . Knowing the way DIRECT divides a hypercube, the hyperrectangle with center c(s+1) has N − 1 sides with a length of 1/3k+1, and one side with a length of 1/3k+2. The short side is parallel to the N-th coordinate axis and the level of the hyperrectangle is k + 1, its stage is 1. Let the assumption be true for p

    (s) (s) Let s =(k+1)N+p. By assumption f(c )=fmin. Furthermore the N − p sides parallel to the first N −p coordinate axes have a length of 1/3k+1, and the other p sides have a length of 1/3k+2. Therefore this hyperrectangle has level k + 1 and stage p. Using lemma 3.3 we know that this hyperrectangle is potentially optimal. Therefore DIRECT will divide it in iteration s +1. Since DIRECT divides a hyperrectangle only along its longest sides, the last p coordinates of all new sample points will be exactly the same as the last p coordinates of c(s). Therefore, to DIRECT, the hyperrectangle looks like a N − p-dimensional Chapter 4. Analysis of DIRECT 81

    hypercube. DIRECT will sample the function at      3   3       .   .   .   .           3   3        ←   ←  1  i  5  i         i 1  3  i 1  3  x =   , y =   ,i=1,...,N −p. 2 · 3k+2  .  2 · 3k+2  .   .   .           3    3              1    1    .   .   .  p  .  p         1 1

    Here the last p entries of xi and yi are 1. Furthermore the i-th coordinate of xi is 1, while the i-th coordinate of yi is 5. As before we have f(xi)

    1 2 N−p (s+1) N−p 1,...,N −p and f(x ) >f(x)>··· >f(x ). Therefore fmin = f(x )= ³ P P ´ 1 N−(p+1) N (s+1) N−p 2·3k+2 3 l=1 al + l=N−(p+1)+1 al and c = x . Note that for p +1=N the hyperrectangle with center c(s+1) is a hypercube with level k + 2 and stage 0. For all other p, the hyperrectangle with center c(s+1) has a level of k + 1. The N − (p +1) sides parallel to the first N − (p + 1) axes have a length of 1/3k+1, the other p +1 sides have a length of 1/3k+2. Therefore this hyperrectangle has stage p + 1. This completes the proof.

    4.3.4 Hidden constraints

    We now want to take a look at the behavior of DIRECT and DIRECT-l for problem P00. In these problems hidden constraints exist, that is the function is only defined over a subset B ⊂ Ω, where B is not given analytically. In this problem even finding a feasible point can be difficult. Both DIRECT and DIRECT-l treat the function as a constant if they have not found a feasible point. Lemmas 4.3 and 4.4 explain Chapter 4. Analysis of DIRECT 82

    the behavior of the two methods for the constant function. When the function is constant, both methods divide only hyperrectangles with the largest measure in any iteration. Therefore they both create a grid with a mesh size of 1/3k before they sample any point that is closer than 1/3k to any other sampled point. As long as they don’t find a feasible point, both methods will continue to create more refined mesh- grids. DIRECT-l divides only one hyperrectangle in every iteration for a constant function, whereas DIRECT divides increasingly more hyperrectangles in an iteration as the optimization progresses. This means that in many cases DIRECT-l will find a feasible point with fewer function evaluations than DIRECT, and at most both methods will need the same number of function evaluations. Note that the methods can only find a feasible point if the set B has points in the interior. We will see an example of this in Section 5.1.2. This is especially important when we use DIRECT and DIRECT-l as starting point generators for Implicit Filtering. Chapter 5

    Numerical Results

    This chapter looks at the performance of the global optimization methods as described in Chapters 2 and 3 on optimization problems faced by the gas pipeline industry, and a suite of test problems. We will show that for the gas pipeline problems, DIRECT-l-IFFCO is the most robust method, while at the same time, DIRECT-l-IFFCO uses a low number of func- tion evaluations. IFFCO, on the other hand, finds a solution with even fewer function evaluations when using the center as a starting point. For many other starting points IFFCO is unable to find a feasible point. This shows IFFCO’s strong dependence on a good starting point. When compared with DIRECT, DIRECT-l finds solutions to these problems with fewer function evaluations. Finally, DIRECT-IFFCO needs more function evaluations than DIRECT-l-IFFCO, but fewer than DIRECT alone. So out of our methods, DIRECT-l-IFFCO seems to be best suited for these problems. This chapter concludes with a comparison of the performance of all methods for the test problems used in the original description of the DIRECT algorithm [48]. Here performance was measured using two different termination criteria. For the first criteria, which stops the algorithms once they are within a certain percentage of the global minimum, DIRECT-l needs fewer function evaluations than DIRECT, sometimes considerably fewer. For the second criteria we used a fixed budget. In this case DIRECT and DIRECT-l performed equally well, but are again outperformed by

    83 Chapter 5. Numerical Results 84

    DIRECT-IFFCO and DIRECT-l-IFFCO. Finally, we reemphasize IFFCO’s dependence on a good starting point. For some problems, IFFCO finds a solution with fewer function evaluations than the other methods, while for other problems it terminates with a local solution.

    5.1 Numerical Results in the Optimization of Gas Pipeline Networks

    The purpose of this section is to show the performance of the numerical methods on problems arising in an industrial environment. Before we report the results of the numerical methods introduced in chapters 2 and 3 at the end of this section, we describe the general optimization infrastructure used in the gas pipeline industry. Furthermore we introduce three example problems arising in the gas pipeline industry which we used for the numerical comparisons.

    5.1.1 The optimization Infrastructure

    In the gas pipeline industry, there are two kind of models that are used for optimiza- tion, each arising in different departments. First there are the crude models which come from the financial departments, and second there are the, in comparison, high accuracy models developed by the engineering departments. The responsibility for optimization in the gas pipeline industry lies mainly with the financial departments. Historically, they used their crude, large scale models and standard linear programming (LP) methods to do economical optimizations. The last few years have seen great improvements in available computing power and simulation techniques. These improvements made it possible to use the high precision models of the engineering departments for economic optimizations. Optimization problems created by these new models are much more complex and therefore can no longer be solved with LP methods. Furthermore, these models were not written Chapter 5. Numerical Results 85

    with optimization in mind, see Luongo et al. [52]. This means that, for example, no derivative information is available. Substantial investments in these engineering models over the last decades required that optimization methods reuse the original simulator models. Therefore a hierarchical architecture for the optimization methods was needed, where the objective function is given by the existing software, used as a “black box”. Implicit Filtering and DIRECT were designed for such problems. We now take a closer look at these models. According to [52], the three main types of independent variables for these problems are: • Physical flow at supply or delivery points. • Discharge and suction pressures in compressor and regulators. • Flows assigned to specific contracts. Luongo et al. [51] show that compressor stations and gas transmission networks are highly nonlinear. Even the pipelines, the simplest parts of the network, are already modeled using fractions and square roots.

    5.1.2 Example problems

    The three problems presented here look at minimizing the cost of fuel and/or electrical power for the compressor stations in a gas pipeline network. In these problems cost may be reduced in two ways: by changing the flow patterns through the system and/or by changing the pressure settings system-wide. In each of these examples, the two design variables are flow variables. The flow variables are characterized as either unknown ‘inlet/outlet flows’, or ‘Kirchoff law’ representations of flow splits between possible alternative paths [15]. The simulation used to evaluate f requires an internal optimization of system wide pressure settings. Since we needed to protect proprietary data, we could not use the original software. Instead, Carter [11] provided us with function values over an equidistant grid. To evaluate the function, we used piecewise linear interpolation. The format provided by Carter allows for easy distribution of the problems to the wider optimization Chapter 5. Numerical Results 86

    500

    400

    300

    200

    100

    0

    −100

    −200

    −300

    −300 −200 −100 0 100 200 300

    Figure 5.1: The feasible area of Problem A. community. Linear interpolation also speeds up computations significantly, allowing intensive numerical experiments. In the original software a single function evaluation takes about one CPU second [11]. This is significantly more than our piecewise linear interpolation. Figures 5.1, 5.3, and 5.5 show the domains and feasible areas of the three exam- ples we used, and Figures 5.2, 5.4, and 5.6 show the optimization landscapes of these examples. In the landscapes the vertical axis represents the amount of fuel (in thou- sand standard cubic feet/day) used by the compressor stations. The horizontal axes, which correspond to the axes in the feasible area plots, represent the flow variables (in thousand standard cubic feet/day). In order to show more details in the optimization landscapes, Figures 5.4 and 5.6 display only parts of the original domains. The internal pressure optimization in the function evaluation involves solving a large combinatoric problem using nonsequential dynamic programming [13]. Fur- thermore, the dynamic programming formulation attempts to find the best settings for other discrete variables. An example of such a variable would be the decision whether or not to shut down entire compressor stations. One of the reasons for the Chapter 5. Numerical Results 87

    8000

    7000

    6000

    5000

    4000 Gas usage (in thousand cubic feet/day) 3000 300

    200 300 100 200 100 0 0 −100 −100 −200 −200 Flow variable x(2) −300 Flow variable x(1)

    Figure 5.2: The optimization landscape of Problem A.

    600

    500

    400

    300

    200

    100 Flow variable x(2)

    0

    −100

    −200

    −300 −300 −200 −100 0 100 200 300 Flow variable x(1)

    Figure 5.3: The feasible area of Problem B. Chapter 5. Numerical Results 88

    4400

    4200

    4000

    3800

    3600

    3400 Gas usage (in thousand cubic feet/day) 150 100 3200 50 300 0 250 200 150 −50 100 50 −100 Flow variable x(1) Flow variable x(2)

    Figure 5.4: The optimization landscape of Problem B. existence of infeasible regions is that solutions may or may not exist, depending on flow conditions. The roughness seen in the landscapes can also be attributed to the startup and shutdown of individual engines in compressor stations. Roughness may also be caused by discontinuous controls for certain types of engines, or, finally, by the replacement of continuous pressure variables with discrete ones. For Problem C the feasible region consists of only a small band in which the function is feasible. This problem contains several large discontinuities, some of which may be due to the discrete nature of internal variables [14]. Other possible reasons for discontinuities may be traced to properties of the compressor stations themselves. Carter, et al. [14] point out that centrifugal compressors can use nearly as much fuel at low flows as they do at high flows. In Problem C there are, in addition to the large discontinuities, the smaller discontinuities which were described for Problems A and B. Chapter 5. Numerical Results 89

    50

    40

    30

    20

    10

    0

    −10 Flow variable x(2)

    −20

    −30

    −40

    −50

    −50 0 50 100 150 200 250 Flow variable x(1)

    Figure 5.5: The feasible area of Problem C.

    6500

    6000

    5500

    5000

    4500 Gas usage (in thousand cubic feet/day) 4000 4.5

    4 200 3.5 150 100 3 50 2.5 0 −50 2 Flow variable x(2) −100 Flow variable x(1)

    Figure 5.6: The optimization landscape of Problem C. Chapter 5. Numerical Results 90

    5.1.3 Accuracy of the model

    Before comparing the results of the different methods, we first take a brief look at the economic interpretation of the objective function. The values of the objective function are given in thousand cubic feet per day (MCF/day) in the optimization landscapes. Using a current wholesale value of gas of approximately $1.20/MCF, we can value a thousand cubic feet per day at roughly $400 per year. According to Carter [11], values within 50 MCF/day of the global minimum are acceptable as a solution to the problem. These 50 MCF/day translate into $20,000 per year. We will use these numbers in the comparison of the results of the three different methods.

    5.1.4 Setup of the numerical Experiments

    Now that we have introduced the problems from the gas pipeline industry, let us turn to the results of the different methods discussed in chapters 2 and 3. These results were computed using our implementation of Implicit Filtering, called IFFCO, and our implementations of DIRECT, DIRECT-l, DIRECT-IFFCO, and DIRECT-l-IFFCO. Tables 5.1 through 5.3 show the lowest, that is the best, objective function value found at each stage of the algorithms. We report results for IFFCO with the center of the bounding box as its starting point. We furthermore report average results for IFFCO, where we used 500 different random starting points. The average was taken over all the starting points for which IFFCO finds a solution to the problem, and we report the results as Avg. Iffco.

    In each of the computations we gave DIRECT and DIRECT-l a budget of νD func- tion evaluations, and IFFCO a budget of νI . Using this notation, DIR-IFF(νD,νI)

    means that we gave DIRECT a budget of νD. We then used the best point found by DIRECT as a starting point for IFFCO. We then ran IFFCO until either the bud-

    get νI was exhausted or IFFCO had converged for all scales. Note that if DIRECT does not find a feasible point within its budget, it is allowed to run until a feasible Chapter 5. Numerical Results 91

    point is found. DIRECT then reallocated its budget as described in Section 3.4. This reallocation occurred in Problems B and C. { −k}kmax Independent of its budget, we ran IFFCO with scales of 2 k=1 , where we set kmax = 12. IFFCO terminates when either all scales are exhausted or the function budget is reached. Note that the budget for IFFCO and DIRECT (DIRECT-l) is not strict. If their budget is exhausted, all three methods will finish an iteration before terminating.

    5.1.5 Results of the optimization

    Using the three problems from the gas pipeline industry, we will show that combining DIRECT/DIRECT-l and IFFCO creates the most robust method. Due to the fact that IFFCO depends strongly on a good starting point it is not as robust on its own. We show that for a high percentage of starting points IFFCO does not even find a feasible point. While DIRECT and DIRECT-l alone always find a solution to the problem, they need more function evaluations, sometimes substantially more, than the other methods. Following are the results for each problem in greater detail. Tables 5.1 through 5.3 report the number of function evaluations done and the low- est function values found with each method used. The last two columns show the low- est function value found by DIRECT and IFFCO. For the DIRECT-IFFCO (DIRECT-l-IFFCO) methods, the last column indicates the improvement made by IFFCO over the value found by DIRECT (DIRECT-l).

    Problem A

    Table 5.1 shows the results for Problem A. Each method was given a budget of 40 function evaluations. Given this budget only IFFCO and DIRECT-l-IFFCO found a solution. In contrast, DIRECT and DIRECT-l required a budget of 70 and 60, respec- tively, to find a solution. Note that, as previously mentioned in section 5.1.3, values within 50 of the global minima are accepted as solutions. Finally, DIRECT-IFFCO Chapter 5. Numerical Results 92

    Table 5.1: Results for Problem A. The actual minimum value is 3233. Method # of ff-value DIR Iffco Total DIR Iffco Total DIRECT (40) 51 — 51 3320 — 3320 DIRECT (70) 79 — 79 3267 — 3267 DIRECT-l (40) 45 — 45 3291 — 3291 DIRECT-l (60) 67 — 67 3267 — 3267 Iffco (40) — 41 41 — 3257 3257 Avg. Iffco (40) — 42 42 — 3266 3266 DIR-Iff (10, 30) 21 30 51 3374 3312 3312 DIR-Iff (10, 40) 21 42 63 3374 3251 3251 DIR-l-Iff (10, 30) 11 32 43 3898 3280 3280 Bounds : [−374.085, 374.085] × [−374.085, 575.142]

    Feasible point found No feasible point 500

    400

    300

    200

    100

    Flow variable x(2) 0

    −100

    −200

    −300

    −300 −200 −100 0 100 200 300 Flow variable x(1)

    Figure 5.7: 500 randomly selected starting points for IFFCO used in Problem A. Chapter 5. Numerical Results 93

    Table 5.2: Results for Problem B. The actual minimum value is 3204. Method # of ff-value DIR Iffco Total DIR Iffco Total DIRECT (40) 45 — 45 3477 — 3477 DIRECT (110) 121 — 121 3228 — 3228 DIRECT-l (40) 43 — 43 3228 — 3228 Iffco (20) — 22 22 — 3248 3248 Iffco (40) — 44 44 — 3211 3211 Avg. Iffco (40) — 42 42 — 3211 3211 DIR-Iff (10, 30) 45 32 77 3477 3209 3209 DIR-l-Iff (10, 30) 37 31 68 3277 3209 3209 Bounds : [−329.075, 329.076] × [−329.076, 665.73]

    only finds a local minima with a combined budget of (10, 30), and requires a budget of (10, 40) to find a solution. To emphasize the dependence of IFFCO on its starting point, we started IFFCO with 500 randomly chosen starting points as shown in Figure 5.7. For all other starting points, marked with a ’.’, IFFCO found a solution. For 36 % of the starting points, marked with a ’+’, IFFCO was unable to find a solution. With a budget of 40, IFFCO terminates with a local solution for 29 % of the starting points, and finds a solution for the other 35 % of the starting points. When we increased the budget for IFFCO to 200, IFFCO finds a solution every time it finds a feasible point.

    Problem B

    Table 5.2 summarizes the results for Problem B. An important feature of this problem is that the midpoint of its domain Ω is infeasible. The feasible area is so small that DIRECT needs three iterations and 45 function evaluations to find a feasible point. Despite this, DIRECT does not find a solution within the budget, but needs 121 function evaluations. In order to explain why DIRECT-l-IFFCO requires a large number of function eval- uations, we take a closer look at the first few iterations of DIRECT-l. DIRECT-l needs Chapter 5. Numerical Results 94

    Feasible point found No feasible point 600

    500

    400

    300

    200

    100 Flow variable x(2)

    0

    −100

    −200

    −300 −300 −200 −100 0 100 200 300 Flow variable x(1)

    Figure 5.8: 500 randomly selected starting points for IFFCO used in Problem B.

    21 function evaluations to find a feasible point. So when we use DIRECT-l-IFFCO with a budget of 10 for DIRECT-l, DIRECT-l adjusts the budget to 31 = 21 + 10. This explains why in DIRECT-l-IFFCO, DIRECT-l does 37 function evaluations be- fore switching to IFFCO. With a larger budget, DIRECT-l does not need to readjust its budget and finds a solution with 43 function evaluations. DIRECT-l-IFFCO and DIRECT-IFFCO solve Problem B with the same combined budget of (10, 20) as for Problems A and C. In contrast, we can see in Table 5.2 that IFFCO with the center as its starting point solves this problem sufficiently with a budget of 20 function evaluations, and only improves slightly with a budget of 40. This fast convergence can be explained by the optimization landscape for this problem, shown earlier in Figure 5.4. Where the function is defined, it behaves like a convex function with an added low amplitude term. This is exactly the kind of function IFFCO was designed for. Again, as for Problem A, we want to point to the dependence of IFFCO on its starting point. Figure 5.8 shows the 500 randomly generated starting points for IFFCO we examined. For 72.6 % of the starting points IFFCO terminates without finding a feasible point, Chapter 5. Numerical Results 95

    Table 5.3: Results for Problem C. The actual minimum value is 4041. Method # of ff-value DIR Iffco Total DIR Iffco Total DIRECT (10) 3671 — 3671 4045 — 4045 DIRECT (40) 3709 — 3709 4042 — 4042 DIRECT-l (40) 871 — 871 4401 — 4401 DIRECT-l (1000) 1005 — 1005 4042 — 4042 Iffco (40) — 40 40 — 4051 4051 Avg. Iffco (40) — 42 42 — 4056 4056 DIR-Iff (10, 30) 3671 31 3702 4045 4045 4045 DIR-l-Iff (10, 30) 845 33 878 4403 4068 4068 Bounds : [−99.766, 274.201] × [−58.469, 58.469]

    marked with ’+’ in the graph. For 0.4% of the starting points, IFFCO finds a feasible point, but terminates in a local minimum, even when we increase the budget. Only for 27% of the starting points marked with a ’.’ IFFCO finds a feasible point, and a solution.

    Problem C

    The results for Problem C are shown in Table 5.3. This problem is very hard for two reasons. First, the feasible area is very small as shown earlier in Figure 5.5. Secondly, the function has large discontinuities in the small band where it is feasible. Now, let us examine why these two properties provide great difficulty for these methods. In order to explain the large number of function evaluations done by DIRECT, DIRECT-l, DIRECT-IFFCO, and DIRECT-l-IFFCO, it is necessary to take a closer look at the first iteration of DIRECT and DIRECT-l. The small feasible area makes it hard to find a feasible point. DIRECT-l needs 829 function evaluations to find a feasible point, while DIRECT needs 3645 function evaluations to find a feasible point. Since DIRECT (DIRECT-l) treats the function as a constant when DIRECT has not found a feasible pint, we can explain the big difference in the number of function evaluations between DIRECT and DIRECT-l by using the observations made in section 4.3.2 for Chapter 5. Numerical Results 96

    50

    40

    30

    20

    10

    0

    −10

    −20

    −30

    −40

    −50

    −100 −50 0 50 100 150 200 250

    Figure 5.9: The division DIRECT creates for Problem C.

    the constant function. Since DIRECT divides all potentially optimal hyperrectangles in a given iteration, the number of function evaluations in an iteration can be large. For Problem C, DIRECT does 2916 function evaluations in the iteration when it finds the first feasible points. In contrast to DIRECT, DIRECT-l only divides one potentially optimal hyperrect- angle in each iteration. Therefore in every iteration DIRECT-l uses substantially fewer function evaluations. Although this results in many more iterations done by DIRECT-l in comparison to DIRECT, DIRECT-l requires substantially fewer function evaluations overall. This difference in the strategies between DIRECT and DIRECT-l is reflected in Figures 5.9 and 5.10. Only the rectangles with a shaded background have a feasible midpoint. Using a budget of 30 function evaluations, DIRECT-l only finds a local minimum. The difference between this local minimum and the global minimum is about 360 Chapter 5. Numerical Results 97

    50

    40

    30

    20

    10

    0

    −10

    −20

    −30

    −40

    −50

    −100 −50 0 50 100 150 200 250

    Figure 5.10: The division DIRECT-l creates for Problem C. Chapter 5. Numerical Results 98

    Feasible point found No feasible point 50

    40

    30

    20

    10

    0

    −10 Flow variable x(2)

    −20

    −30

    −40

    −50

    −50 0 50 100 150 200 250 Flow variable x(1)

    Figure 5.11: 500 randomly selected starting points for IFFCO used in Problem C.

    MCF/day. When we use the wholesale value of $1.20/MCF as we did in Section 5.1.3, this difference equals roughly $160,000 per year, a substantial amount. DIRECT-l finds a solution, provided its budget is large enough, as we can see when we use a budget of 1000 function evaluations for DIRECT-l. DIRECT-IFFCO and DIRECT-l-IFFCO find an acceptable function value with the same combined budget of (10,20) as for the first two problems. Finally, the small feasible area of Problem C may also provide difficulties for IFFCO. With the midpoint as the starting point, IFFCO finds a solution with 43 function evaluations, but when we started IFFCO from 500 randomly generated starting points indicated in Figure 5.11, only for 13.4% of the starting points, marked with a ’.’, IFFCO found a feasible point before terminating. For 3.2 % of the starting points IFFCO terminates in a local minima, which is reduced to 2.6 % with a budget of 200. For 86.6 % of the starting points marked with a ’+’ IFFCO is unable to find any feasible point, much less a solution. Chapter 5. Numerical Results 99

    Table 5.4: Success rates for IFFCO with 500 randomly selected starting points. Problem No feasible Point Local Minima Global Minima A(40) 36.0 % 29.0 % 35.0 % A(200) 36.0 % 0.0 % 64.0 % B(40),B(200) 72.6 % 0.4 % 27.0 % C(40) 86.6 % 3.2 % 10.2 % C(200) 86.6 % 2.6 % 10.8 %

    Summary of the Numerical Results

    In Table 5.4 we summarize the results for the 500 different starting points for IFFCO. This again emphasizes the dependence of IFFCO on its starting point. All of these observations suggest that the combined algorithms are more robust than either algorithm alone. DIRECT-l-IFFCO performed better than DIRECT-IFFCO for these test problems from the gas pipeline industry, since it found solutions with fewer function evaluations.

    5.2 Numerical Results on Test Problems used by Jones et al.

    This section begins with a description of test problems used by Jones et al. [48] to show the performance of DIRECT. Next, we report the numerical results for our methods and compare them to the results of DIRECT. We will show that combining DIRECT (DIRECT-l) creates a more robust method than either method alone, which at the same time only uses a low number of function evaluations. For these test problems, it does not matter if we use DIRECT or DIRECT-l to create the starting point for IFFCO, the performance was the same. IFFCO again depends strongly on its starting point, whereas DIRECT and DIRECT-l need more function evaluations than in combination with IFFCO. Finally, we show that DIRECT-l alone needs fewer function evaluations Chapter 5. Numerical Results 100

    Table 5.5: Parameters for the Shekel’s family of functions T i ai ci 1 4.0 4.0 4.0 4.0 .1 2 1.0 1.0 1.0 1.0 .2 3 8.0 8.0 8.0 8.0 .2 4 6.0 6.0 6.0 6.0 .4 5 3.0 7.0 3.0 7.0 .4 6 2.0 9.0 2.0 9.0 .6 7 5.0 5.0 3.0 3.0 .3 8 8.0 1.0 8.0 1.0 .7 9 6.0 2.0 6.0 2.0 .5 10 7.0 3.6 7.0 3.6 .5

    than DIRECT to find a solution.

    5.2.1 Description of the Test Problems

    Jones et al. [48] describe results for their original implementation of DIRECT on nine different test problems. The first seven problems were originally given by Dixon and Szeg¨o [27]. These problems have been widely used to compare global optimization algorithms [27, 45, 46]. Problems eight and nine come from Yao [75]. We now describe these test problems in more detail.

    Shekel’s family (S5,S7,S10) [27] Xm 1 f(x)=− ,x,a ∈RN,c >0,∀i=1,...,m, (x − a )T (x − a )+c i i i=1 i i i Ω=[0,10]N .

    Three instances of the Shekel function are used in the comparisons. Here N =

    4,m=5,7 and 10. The values of ai and ci are given in Table 5.5. Chapter 5. Numerical Results 101

    Table 5.6: Parameters for the Hartman’s family of functions First case : N =3,m=4. i ai ci pi 1 3. 10. 30. 1. 0.3689 0.1170 0.2673 2 .1 10. 35. 1.2 0.4699 0.4387 0.7470 3 3. 10. 30. 1. 0.1091 0.8732 0.5547 4 .1 10. 35. 3.2 0.03815 0.5743 0.8828

    Second case : N =6,m=4. i ai ci 1 10. 3. 17. 3.5 1.7 8. 1. 2 .05 10. 17. .1 8. 14 1.2 3 3. 3.5 1.7 10. 17. 8. 3. 4 17. 8. .05 10. .1 14. 3.2

    i pi 1 0.1312 0.1696 0.5569 0.0124 0.8283 0.5886 2 0.2329 0.4135 0.8307 0.3736 0.1004 0.9991 3 0.2348 0.1451 0.3522 0.2883 0.3047 0.6650 4 0.4047 0.8828 0.8732 0.5743 0.1091 0.0381

    Hartman’s family (H3, H6) [27] Ã ! Xm XN 2 N f(x)=− ci exp − aij(xj − pij) ,x,ai,pi ∈R ,ci >0,∀i=1,...,m, i=1 j=1

    Ω=[0,1]N .

    We will look at two instances of the Hartman function. The values of the param- eters and the dimensions of the problems are given in Table 5.6.

    Branin function (BR) [27]

    5.12 5 2 1 f(x1,x2)=(x−2− x+ x1−6) + 10(1 − ) cos x1 +10, 4π2 1 π 8π Ω=[−5,10] × [0, 15]. Chapter 5. Numerical Results 102

    Branin function

    350

    300

    250

    200

    150

    100

    50

    0 15

    10 10 5 5 0

    x 0 −5 2 x 1

    Figure 5.12: Plot of the Branin function.

    This function has three global minima. Figure 5.12 shows a plot of this function.

    Goldstein and Price function (GP) [27]

    £ ¤ 2 2 2 f(x1,x2)= 1+(x1 +x2 +1) (19 − 14x1 +3x1 −14x2 +6x1x2 +3x2) £ ¤ 2 2 2 30 + (2x1 − 3x2) (18 − 32x1 +12x1 +48x2 −36x1x2 +27x2) ,

    Ω=[−2,2]2.

    The function has four local minima and one global minimum at x∗ =(0,−1)T with f(x∗) = 3. In figures 5.13 and 5.14 we show plots of this function. The first figure shows the whole domain, while the second figure shows only the area around the global minimum. Chapter 5. Numerical Results 103

    Goldstein and Price function

    8 Goldstein and Price function x 10

    9

    10 8 x 10 7 3 6

    2.5 5

    4 2 3

    1.5 2

    1 1 0 0 0.5 −0.5 1 0 0.5 −1 2 0 −1.5 1 2 −0.5 1 0 x −2 −1 2 x 0 1 −1 −1 x −2 −2 2 x 1 Figure 5.14: Plot of the Gold- Figure 5.13: Plot of the Gold- stein and Price function around stein and Price function. the global minimum.

    Six-hump camelback function (C6) [75]

    2 4 2 2 2 f(x1,x2)=(4−2.1x1+x1/3)x1 + x1x2 +(−4+4x2)x2,

    Ω=[−3,3] × [−2, 2].

    The function has six minima, two of which are global. The global minima are located at x∗ =(§0.0898, ∓0.7126)T and f(x∗)=−1.0316. Figures 5.15 and 5.16 show plots of this function. Again, the first figure shows the whole domain, while the second shows an enlargement around the global minima.

    Two-dimensional Shubert function (SH) [75]

    Ã !Ã ! X5 X5 f(x1,x2)= j cos[(j +1)x1 +j] j cos[(j +1)x2 +j] , j=1 j=1 Ω=[−10, 10]2.

    The function has 760 local minima, of which 18 are global. Figures 5.17 and 5.18 show two plots of this function. Again, the first figure shows the whole domain, while the second shows an enlargement around one of the global minima. Chapter 5. Numerical Results 104

    Six−hump camelback function

    Six−hump camelback function

    4

    3 200 2 150 1

    100 0

    50 −1

    0 −2 1

    0.5 1 −50 0.5 2 0 0 1 3 −0.5 2 −0.5 0 1 x −1 −1 2 x 0 1 −1 −1 −2 x −2 −3 2 x 1 Figure 5.16: Plot of the Figure 5.15: Plot of the Six-hump camelback function Six-hump camelback function. around the global minima.

    Two−dimensional Shubert function Two−dimensional Shubert function

    250 250 200 200 150 150 100 100 50 50 0 0 −50 −50 −100 −100 −150 −150 −200 −200 2 10 1 2 5 10 1 5 0 0 0 0 −1 −5 −1 −5 x −2 −2 x −10 −10 2 x 2 x 1 1

    Figure 5.17: Plot of the Figure 5.18: Plot of the two-dimensional Shubert func- two-dimensional Shubert function tion. around a global minimum. Chapter 5. Numerical Results 105

    Table 5.7: Summary of the important features of the test functions. # Name N Ω Global minima number function value 1 Branin (BR) 2[−5,10] × [0, 15] 3 0.398 2 Shekel-5 (S5) 4[0,10]4 1 -10.153 3 Shekel-7 (S7) 4[0,10]4 1 -10.403 4 Shekel-10 (S10) 4[0,10]4 1 -10.536 5 Hartman-3 (H3) 3[0,1]3 1 -3.863 6 Hartman-6 (H6) 6[0,1]6 1 -3.322 7 Goldprice (GP) 2[−2,2]2 1 3.000 8 Sixhump (C6) 2[−3,3] × [−2, 2] 2 -1.032 9 Shubert (SH) 2[−10, 10]2 18 -186.831

    Table 5.7 summarizes the important features of the test problems used in our numerical experiments.

    5.2.2 Numerical results

    For each problem, we report two different sets of numerical results based on two different termination criteria. In the first set we follow Jones et al. [48] and terminate based on knowledge of the global minimum as explained in detail below. Since in real applications the global minimum is not known, and therefore the first termination criteria cannot be used, we choose to use a given function budget as the second termination criteria. This allows a more realistic comparison of these methods.

    Termination Based on the Global Minimum

    Table 5.8 reports the results for DIRECT and DIRECT-l using the termination criteria from Jones et al. [48]. This termination criteria uses knowledge of the global minimum value to terminate once the percentage error is small, see section 3.2. We repeat the definition of percentage error here for clarity:

    Let fglobal be the known global minimal function value and denote by fmin the Chapter 5. Numerical Results 106

    Table 5.8: Numerical results with percentage termination criteria. Problem N DIRECT DIRECT-l f-eval. p f-eval. p 1 (BR) 2 195 0.98E-03 159 0.98E-03 2 (S5) 4 155 0.84E-02 147 0.84E-02 3 (S7) 4 145 0.93E-02 141 0.93E-02 4 (S10) 4 145 0.97E-02 139 0.97E-02 5 (H3) 3 199 0.85E-02 111 0.85E-02 6 (H6) 6 571 0.89E-02 295 0.89E-02 7 (GP) 2 191 0.30E-02 115 0.30E-02 8 (C6) 2 285 0.48E-03 191 0.48E-03 9 (SH) 2 2967 0.50E-02 2043 0.50E-02

    best function value found by DIRECT. We define the percentage error p as   fmin−fglobal 100 ,fglobal =06 , |fglobal| p =  100fmin,fglobal =0.

    Following Jones, we terminate the iteration once p is lower than 0.01 or over 20000 function evaluations have been completed at the end of a sweep. In all runs we set ² =0.0001 for both the original implementation and our implementation. Although both methods can solve this problem, our modification consistently requires fewer function evaluations, significantly fewer for problems 5 through 9. These results emphasize our earlier observations: DIRECT-l should be used for lower dimensional problems which do not have too many local and global minima.

    Termination on a Budget

    Since in real applications the global minimal function value is unknown, the termina- tion criteria used by Jones cannot be used. Instead, we choose to use a more realistic termination criteria by reporting results on all test problems with given, fixed func- tion budgets. As we pointed out earlier, all methods complete an iteration before Chapter 5. Numerical Results 107

    Table 5.9: Numerical results with a budget of 60N function evaluations

    Problem N IFFCO DIRECT DIRECT-l Aver. Aver.p Rate (%) f-eval. p f-eval. p f-eval. 1 (BR) 2 83 0.86E-03 98.6 141 0.21E-01 131 0.12E-01 2 (S5) 4 234 0.14E-02 21.6 255 0.30E-04 243 0.30E-04 3 (S7) 4 237 0.14E-02 17.8 247 0.11E-02 243 0.11E-02 4 (S10) 4 237 0.17E-02 15.4 243 0.11E-02 243 0.11E-02 5 (H3) 3 149 0.80E-05 77.8 199 0.85E-02 181 0.51E-02 6 (H6) 6 336 0.57E-04 66.6 373 0.37E-01 375 0.68E-02 7 (GP) 2 95 0.93E-03 52.6 123 0.25E+00 135 0.30E-02 8 (C6) 2 96 0.10E-03 99.8 141 0.41E+00 135 0.79E+00 9 (SH) 2 88 0.19E-02 43.8 127 0.82E+02 127 0.82E+02

    terminating. This explains why the number of function evaluations done is larger than the budget. In Tables 5.9 through 5.11 we report the results for the different methods with an overall budget of 60N. It is important to note that the budget depends on the dimension of the problem. We did this to account for the increase in difficulty in higher dimensions. Even with a low budget of 10N, DIRECT and DIRECT-l are able to find a good starting point for IFFCO. So, with an overall budget of 60N for the combined method, we allowed DIRECT or DIRECT-l 10N function evaluations and IFFCO 50N function evaluations. Table 5.9 shows the results for IFFCO, DIRECT and DIRECT-l using this fixed budget. We accept a point as a solution if its percentage error p is lower than 0.01. For IFFCO, we used 500 randomly generated starting points and report the average number of function evaluations IFFCO needed to find a solution. In the column labeled “Rate (%)” we report how often (in percent) IFFCO was able to find a solution within the budget of 60N function evaluations, and did not terminate in a local minimum. Note that the average was calculated using solely the runs when IFFCO found a solution. Chapter 5. Numerical Results 108

    Table 5.10: Numerical results for DIRECT-IFFCO with a combined budget of 60N function evaluations. The budget is divided into 10N function evaluations for DIRECT and 50N function evaluations for IFFCO. Problem N Function evaluations Percentage error DIRECT IFFCO Total DIRECT IFFCO 1 (BR) 2 23 64 87 0.15E+02 0.38E-04 2 (S5) 4 43 197 240 0.93E+02 0.32E-05 3 (S7) 4 47 202 249 0.87E+02 0.32E-02 4 (S10) 4 47 200 247 0.85E+02 0.31E-02 5 (H3) 3 43 114 157 0.12E+01 0.57E-07 6 (H6) 6 73 297 370 0.45E+02 0.82E-06 7 (GP) 2 21 71 92 0.20E+03 0.68E-03 8 (C6) 2 25 103 128 0.39E+02 0.45E-05 9 (SH) 2 33 97 130 0.91E+02 0.38E-02

    Table 5.11: Numerical results for DIRECT-l-IFFCO with a combined budget of 60N function evaluations. The budget is divided into 10N function evaluations for DIRECT-l and 50N function evaluations for IFFCO. Problem N Function evaluations Percentage Error DIRECT-l IFFCO Total DIRECT-l IFFCO 1 (BR) 2 25 64 89 0.15E+02 0.38E-04 2 (S5) 4 41 197 238 0.93E+02 0.32E-05 3 (S7) 4 41 202 243 0.87E+02 0.32E-02 4 (S10) 4 41 200 241 0.85E+02 0.31E-02 5 (H3) 3 33 114 147 0.12E+01 0.57E-07 6 (H6) 6 67 287 354 0.13E+02 0.00E+00 7 (GP) 2 21 71 92 0.20E+03 0.68E-03 8 (C6) 2 25 76 101 0.12E+02 0.62E-05 9 (SH) 2 25 97 122 0.91E+02 0.38E-02 Chapter 5. Numerical Results 109

    Especially for problems 2 through 4 IFFCO finds a solution only for a low percentage of starting points. Only for problems 1 and 8 is IFFCO nearly always able to find a solution. DIRECT only finds acceptable solutions for four problems with this budget, while DIRECT-l can solve a total of six problems with this budget. In contrast, Tables 5.10 and 5.11 show the results for DIRECT-IFFCO and DIRECT-l-IFFCO. These two methods need about the same number of function eval- uations to solve all problems, except for problem 8, where DIRECT-l-IFFCO needs significantly fewer function evaluations. Note that for some of the problems the re- duction in function value by IFFCO is significant. DIRECT (DIRECT-l) has not found a solution, but created a starting point good enough for IFFCO to find a solution.

    Summary of the Numerical Results

    In summary, as we also saw with the gas pipeline problems, combining DIRECT/DIRECT-l and IFFCO creates a more robust and more reliable optimization method than each of the methods alone. For these test problems. there is no significant difference be- tween DIRECT-IFFCO and DIRECT-l-IFFCO. IFFCO by itself is too dependent on a good starting point, whereas DIRECT and DIRECT-l alone need many more function evaluations than in combination with IFFCO. Chapter 6

    Conclusions

    In this work we developed a modification of the DIRECT algorithm which we desig- nated DIRECT-l. Both DIRECT and DIRECT-l solve bound constraint global opti- mization problems. DIRECT is a method which can find the area near global min- ima with few function evaluations, but needs many more function evaluations to find global minima. DIRECT-l is strongly biased towards local search, and performs better than DIRECT for small dimensional problems with few global minimizer and few local minimizers. We further improved the performance of both DIRECT and DIRECT-l by combining both methods with Implicit Filtering, another method for bound con- straint optimization. Implicit Filtering uses difference gradients, where the difference increments are reduced as the optimization progresses. These difference gradients are then used in projected Quasi-Newton iterations. For our numerical experiments we implemented both DIRECT and our modifi- cation. This implementation also includes a parallel version and was one of the first, and is still one of only a few, free available implementations. We used our implementation of Implicit Filtering, called IFFCO, to combine both DIRECT and DIRECT-l with Implicit Filtering. We refer to these combinations as DIRECT-IFFCO and DIRECT-l-IFFCO. To compare these methods we used examples from the gas pipeline industry as well as test problems from the literature. IFFCO alone performs very well, but only

    110 Chapter 6. Conclusions 111

    if the starting point is good. IFFCO’s dependence on its starting point is especially detrimental when hidden constraints exist. We say hidden constraints exist when the objective function is only defined on an unknown subset of the bounding box. They commonly occur in industrial applications when the objective function is given by a “black box”, that is a computer program. All the examples from the gas pipeline industry contain such hidden constraints. Using these examples, we saw that for a large percentage of starting points, IFFCO does not even find a feasible point, much less a solution. However, the combinations DIRECT-IFFCO and DIRECT-l-IFFCO do not depend on a starting point, and are able to solve all problems with the same setting. In fact, DIRECT-l-IFFCO solves these problems with fewer, and sometimes significantly fewer, function evaluations than DIRECT-IFFCO. We observed the same behavior for a suite of test problems from the literature. Our development of DIRECT-l was driven by our theoretical results. The main driving force was our explanation of the way in which DIRECT clusters its search near a global optimizer. Furthermore, we provided estimates for the number of iterations DIRECT and DIRECT-l need for constant and linear functions. Additionally, we pro- vided upper and lower bounds for the number of iterations required for extreme cases. Apart from the original paper describing DIRECT [48], these are the first theoretical results published. List of References

    [1] C. Audet and J.E. Dennis. Analysis of generalized pattern searchs. Technical Report TR00-07, Rice University, 2000.

    [2] C. Audet and J.E. Dennis. A pattern search filter method for nonlinear program- ming without derivatives. Technical Report TR00-09, Rice University, 2000.

    [3] C. A. Baker. Parallel global aircraft configuration design space exploration. Technical Report MAD 2000-06-28, Virginia Polytechnic Institute and State Uni- versity, Blacksburg, VA, June 2000.

    [4] C. A. Baker, L. T. Watson, B. Grossman, R. T. Haftka, and W. H. Mason. Par- allel global aircraft configuration design space exploration. Internat. J. Comput. Res. To appear.

    [5] C. A. Baker, L. T. Watson, B. Grossman, R. T. Haftka, and W. H. Ma- son. Parallel global aircraft configuration design space exploration. In Proc. 8th AIAA/USAF/NASA/ISSMO Symp. on Multidisciplinary Analysis and Op- timization, AIAA Paper 2000–4763–CP, Long Beach, CA, 2000. CD-ROM.

    [6] C. A. Baker, L. T. Watson, B. Grossman, R. T. Haftka, and W. H. Mason. Parallel global aircraft configuration design space exploration. In A. Tentner, editor, Proc. High Performance Computing Symposium 2000, pages 101–106, San Diego, CA, 2000. Soc. for Computer Simulation Internat.

    112 References 113

    [7] A. Batterman, J. M. Gablonsky, A. Patrick, C.T. Kelley, K.R. Kavanagh, T. Cof- fey, and C.T. Miller. Solution of a groundwater control problem with implicit filtering. Technical Report CRSC-TR00-30, Center for Research in Scientific Computation, North Carolina State University, December 2000.

    [8] D. P. Bertsekas. On the Goldstein-Levitin-Polyak gradient projection method. IEEE Trans. Autom. Control, 21:174–184, 1976.

    [9] Andrew J. Booker. Design and analysis of computer experiments. In Proceed- ings of the 7th AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, pages 118–128, St. Louis, MO, September 2-4 1998. AIAA-98-4757.

    [10] D. M. Bortz and C. T. Kelley. The simplex gradient and noisy optimization problems. In J. T. Borggaard, J. Burns, E. Cliff, and S. Schreck, editors, Com- putational Methods in Optimal Design and Control, volume 24 of Progress in Systems and Control Theory, pages 77–90. Birkh¨auser, Boston, 1998.

    [11] R. G. Carter. Private communications.

    [12] R. G. Carter. Compressor station optimization: Computational accuracy and speed. Technical report, Stoner Associates, Inc., September 1996.

    [13] R. G. Carter. Pipeline optimization: Dynamic programming after 30 years. In Proceedings of the Pipeline Simulation Interest Group, 1998. Paper number PSIG-9803.

    [14] R. G. Carter, D. W. Schroeder, and T. D. Harbick. Some causes and effect of discontinuities in modeling and optimizing gas transmission networks. Technical report, Stoner Associates, Inc., October 1993. Proceedings of the Twenty Fifth Annual Meeting of the Pipeline Simulation Interest Group. References 114

    [15] R.G. Carter, J.M. Gablonsky, A. Patrick, C.T. Kelley, and O.J. Esslinger. Algo- rithms for noisy problems in gas transmission pipeline optimization. Technical Report CRSC-TR00-10, Center for Research in Scientific Computation, North Carolina State University, May 2000.

    [16] T. D. Choi, O. J. Eslinger, C. T. Kelley, J. W. David, and M. Etheridge. Opti- mization of automotive valve train components with implicit filtering. Technical Report CRSC-TR98-44, Center for Research in Scientific Computation, North Carolina State University, 1998. To appear in Optimization and Engineering.

    [17] T. D. Choi, P. Gilmore, O. J. Eslinger, C. T. Kelley, A. Patrick, and J. M. Gablonsky. Iffco: Implicit Filtering for Constrained Optimization, Version 2. Technical Report CRSC-TR99-23, Center for Research in Scientific Computa- tion, North Carolina State University, July 1999. available by anonymous ftp from math.ncsu.edu in pub/kelley/iffco/ug.ps.

    [18] T.D. Choi and C. T. Kelley. Superlinear Convergence and Implicit Filtering. SIAM Journal of Optimization, 10(4):1149–1162, 2000.

    [19] S. E. Cox, R.T. Haftka, C. A. Baker, B. Grossman, W. H. Mason, and L. T. Watson. Global multidisciplinary optimization of a high speed civil transport. In Proc. Aerospace numerical Simulation Symposium ’99, pages 23–28, Tokyo, Japan, June 16-18 1999.

    [20] S. E. Cox, R.T. Haftka, C. A. Baker, B. Grossman, W. H. Mason, and L. T. Wat- son. Global optimization of a high speed civil transport configuration. In Proc. 3rd World Congress of Structural and Multidisciplinary Optimization, Buffalo, NY, 1999.

    [21] Evin J. Cramer. Private communications.

    [22] Evin J. Cramer. Using approximate models for engineering design. In Proceed- ings of the 7th AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary References 115

    Analysis and Optimization, pages 140–147, St. Louis, MO, September 2-4 1998. AIAA-98-4716.

    [23] J. W. David, C. Y. Cheng, T. D. Choi, C. T. Kelley, and J. M. Gablonsky. Optimal Design of High Speed Mechanical Systems. Technical Report CRSC- TR97-18, Center for Research in Scientific Computation, North Carolina State University, 1997. To appear in Mathematical Modeling and Scientific Comput- ing.

    [24] J. E. Dennis and V. Torczon. Direct search methods on parallel machines. SIAM Journal of Optimization, 1:448–474, 1991.

    [25] J.E. Dennis and C. Audet. Pattern search algorithms for mixed variable pro- gramming. Technical Report TR99-02, Rice University, 1999. to appear in SIAM Journal on Optimization.

    [26] J.E. Dennis and Z. Wu. Parallel continuous optimization. Technical Report TR00-01, Rice University, 2000.

    [27] L.C.W. Dixon and G.P. Szeg¨o. The Global Optimisation Problem: An Introduc- tion. In L.C.W. Dixon and G.P. Szeg¨o, editors, Towards Global Optimization 2, volume 2, pages 1–15. North-Holland Publishing Company, 1978.

    [28] J. M. Gablonsky. An implemention of the DIRECT Algorithm. Technical Report CRSC-TR98-29, Center for Research in Scientific Computation, North Carolina State University, August 1998.

    [29] J. M. Gablonsky. DIRECT Version 2.0 User Guide. Technical Report CRSC- TR01-08, Center for Research in Scientific Computation, North Carolina State University, April 2001.

    [30] J.M. Gablonsky. Modifications of the DIRECT Algorithm. PhD thesis, North Carolina State University, 2001. Pending. References 116

    [31] J.M. Gablonsky and C.T. Kelley. A locally-biased form of the DIRECT al- gorithm. Technical Report CRSC-TR00-31, Center for Research in Scientific Computation, North Carolina State University, December 2000. To appear in Journal of Global Optimization.

    [32] E. A. Galperin. The cubic algorithm. Journal of Mathematical Analysis and Applications, 112:635–640, 1985.

    [33] E. A. Galperin. The beta-algorithm. Journal of Mathematical Analysis and Applications, 126:455–468, 1987.

    [34] E. A. Galperin. Precision, complexity, and computational schemes of the cubic algorithm. Journal of Optimization Theory and Applications, 57(2):223–238, May 1988.

    [35] P. Gilmore. IFFCO: Implicit Filtering for Constrained Optimization. Technical Report CRSC-TR93-7, Center for Research in Scientific Computation, North Carolina State University, May 1993.

    [36] P. Gilmore and C. T. Kelley. An implicit filtering algorithm for optimization of functions with many local minima. SIAM Journal of Optimization, 5(2):269–285, May 1995.

    [37] S. Gomez and A. Levy. The tunneling method for solving the constrained global optimization problem with several non-connected feasible regions. In A. Dold and B. Eckmann, editors, Lecture Notes in Mathematics 909, Nonconvex Opti- mization and Its Applications, pages 34–47. Springer-Verlag, 1982.

    [38] E. Gourdin, B. Jaumard, and B. B. MacGibbon. Global Optimization of Mul- tivariate Lipschitz Functions: Survey and Computational Comparison. Les Cahiers du GERAD, May 1994. References 117

    [39] P. Hansen and B. Jaumard. Lipschitz optimization. In R. Horst and P. M. Pardalos, editors, Handbook of Global Optimization, volume 2 of Nonconvex Op- timization and Its Applications, pages 407–493. Kluwer Academic Publishers, Dordrecht, 1995.

    [40] P. Hansen, B. Jaumard, and Shi-Hui Lu. Global Optimization of univariate Lip- schitz functions: I. Survey and Properties. Mathematical Programming, 55:251– 272, 1992.

    [41] P. Hansen, B. Jaumard, and Shi-Hui Lu. Global Optimization of univariate Lipschitz functions: II. New Algorithms and Computational Comparison. Math- ematical Programming, 55:273–292, 1992.

    [42] W. E. Hart. Sequential stopping rules for random optimization methods with applications to multistart local search. SIAM Journal of Optimization, 9(1):270– 290 (electronic), 1999.

    [43] R. Horst, P. M. Pardalos, and N. V. Thoai. Introduction to global optimization. Kluwer Academic Publishers, Dordrecht, 1995.

    [44] R. Horst and H. Tuy. Global Optimization - Deterministic Approaches. Springer- Verlag, Berlin, 3 edition, 1996.

    [45] W. Huyer and A. Neumaier. Global optimization by multilevel coordinate search. J. Global Optim., 14(4):331–355, 1999.

    [46] E. Janka. Vergleich Stochastischer Verfahren zur Globalen Optimierung. Diplo- marbeit, Universit¨at Wien, 1999.

    [47] D. R. Jones. The DIRECT global optimization algorithm, 1999. To appear in The Encyclopedia of Optimization. References 118

    [48] D. R. Jones, C. D. Perttunen, and B. E. Stuckmann. Lipschitzian optimization without the lipschitz constant. Journal of Optimization Theory and Applications, 79:157, October 1993.

    [49] M. Kokkolaras, C. Audet, and J. E. Dennis. Mixed variable optimization of the number and composition of heat intercepts in a thermal insulation system. Technical Report TR99-21, Rice University, 2000.

    [50] E. Kreyszig. Introductory Functional Analysis with Applications. Wiley Classics Library. John Wiley & Sons, New York, 1978/1989.

    [51] C. A. Luongo, B. J. Gilmour, and D. W. Schroeder. Optimization in natural gas transmission networks: A tool to improve operational efficiency. Techni- cal report, Stoner Associates, Inc., April 1989. Presented at the Third SIAM Conference on Optimization.

    [52] C. A. Luongo, B. J. Gilmour, and W. C. Yeung. Optimizing the operation of gas transmission networks. Technical report, Stoner Associates, Inc., August 1991. Presented at the ASME International Computers in Engineering Conference, Santa Clara, California.

    [53] D. Q. Mayne and C. C. Meewella. A nonclustering multistart algorithm for global optimization. In Analysis and optimization of systems (Antibes, 1988), pages 334–345. Springer, Berlin, 1988.

    [54] C. C. Meewella and D.Q. Mayne. An Algorithm for Global Optimization of Lip- schitz Continuous Functions. Journal of Optimization Theory and Applications, 57(2):307–322, May 1988.

    [55] R. H. Mladineo. An Algorithm for Finding the Global Maximum of a Multi- modal, Multvariate Function. Mathematical Programming, 34:188–200, 1986. References 119

    [56] R. H. Mladineo. Stochastic Minimization of Lipschitz Functions. In Recent Ad- vances in Global Optimization, Princeton Series in Computer Science. Princeton University Press, Princeton, 1991.

    [57] J. A. Nelder and R. Mead. A simplex method for function minimization. Comp. J., 7:308–313, 1965.

    [58] Sigurd A. Nelson II and Panos Y. Papalambros. A modification to Jones’ global optimization algorithm for fast local convergence. In Proceedings of the 7th AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis and Op- timization, pages 341–348, St. Louis, MO, September 2-4 1998. AIAA-98-4751.

    [59] Michael J. Panik. Fundamentals of convex analysis. Theory and decision library. Series B, Mathematical and statistical methods. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1993.

    [60] J. Pint´er. Global Optimization in Action, volume 6 of Nonconvex Optimization and Its Applications. Kluwer Academic Publishers, Dordrecht, 1996.

    [61] S. A. Piyawskii. An algorithm for finding the absolute extremum of a function. USSR Computational Mathematics and Mathematical Physics, 12:57–67, 1972.

    [62] D. W. Schroeder. Hydraulic analysis in the natural gas industry. Technical report, Stoner Associates, Inc., October 1995. Presented at the International Federation of Operations Research Societies (IFORS) Fourth Specialized Con- ference: OR and Engineering Design.

    [63] Y. D. Sergeyev. An information global optimization algorithm with local tuning. SIAM Journal of Optimization, 5(4):858–870, 1995.

    [64] Z. Shen and Y. Zhu. An Interval Version of Shubert’s Iterative Method for the Localization of the Global Maximum. Computing, 23:275–280, 1987. References 120

    [65] B. Shubert. A sequential method seeking the global maximum of a function. SIAM J. Numer. Anal., 9:379–388, 1972.

    [66] R. Storn and K. Price. Differential evolution - a simple and efficient adaptive scheme for global optimization over continous spaces. Technical report TR-95- 012, ICSI, 1995.

    [67] R. G. Strongin. On the convergence of an algorithm for finding a global ex- tremum. Engineering Cybernetics, 11:549–555, 1973.

    [68] Torrid. The Economist, June 24 2000.

    [69] A. S. Tikhomirov. Mixed global and local search methods as optimization algo- rithms. Zh. Vychisl. Mat. i Mat. Fiz., 36(9):50–59, 1996. Translation in Com- putational Mathematics and Mathematical Physics, 36, 1996, no.9 1205-1212, (1997).

    [70] V. Torczon. On the convergence of the multidimensional direct search. SIAM Journal of Optimization, 1:123–145, 1991.

    [71] Michael W. Trosset. What is simulated annealing? Technical Report TR00-08, Rice University, February 2000.

    [72] Vanderplaats Research and Development, Inc. DOT: Design Optimization Tools, 1995. Version 4.20.

    [73] L. T. Watson and C. A. Baker. A fully distributed parallel global search algo- rithm. Engrg. Comput. To appear.

    [74] G. R. Wood and B. P. Zhang. Estimation of the Lipschitz Constant of a Function. Journal of Optimization Theory and Applications, 8(1):91–103, 1996.

    [75] Yong Yao. Dynamic Tunneling Algorithm for Global Optimization. IEEE Trans- actions on Systems, Man, and Cybernetics, 19(5), September / October 1989. References 121

    [76] Z. B. Zabinsky, G. R. Wood, M. A. Steel, and W. P. Baritompa. Pure adaptive search for finite global optimization. Mathematical Programming, 69:443–448, 1995. Appendix A

    Implementation of the DIRECT and DIRECT-l algorithm

    This section describes our implementation of DIRECT and DIRECT-l in detail. We finish the chapter with a description of the parallel version of our implementation and the extension needed to handle hidden constraints.

    A.1 The general algorithm

    The description of the implementation of DIRECT is based on the general description 2.4 in Section 2.5. The implementation was done in FORTRAN 77, which influenced the data structures. Therefore, we begin by describing the data structure we used to store the information gathered in the iteration.

    A.1.1 The data structure

    Storing of the hyperrectangle data

    Since we map the original domain to the unit hyperrectangle, all possible side lengths are given by negative integer powers of 3, that is, 3−i (where i is an integer). Therefore to save computing time and storage space, we precalculate and save the values of 3−i up to a given number maxdeep, which determines the minimal side length reachable.

    122 Appendix A. Implementation of the DIRECT and DIRECT-l algorithm 123

    As mentioned before, this gives us the opportunity to store only an integer for each side length of the hyperrectangles, as opposed to storing a double precision number. The integer then acts as a pointer to the correct value. The dimensions of the (mapped) hyperrectangles are stored in an array length. Its dimension is given by the maximum number of function evaluations allowed (max- func) and the maximal dimension of the problem (maxor). Similarly, there is an array c to store the midpoints of the hyperrectangles, only that this one is an array of double precision numbers of the same dimension as length. Another idea of storing the midpoints would be to construct a tree corresponding to the division. In this tree, the father node is the rectangle before division and the branches are given by the new sampling points. This structure could be implemented using only integers and computing the actual coordinates of the point on the fly. For reasons of simplicity we choose to use the first variant. A final array f is needed for storing the function value at the midpoints of the hyperrectangles. This is a double precision array, whose dimension is given by the maximum number of function evaluations allowed (maxfunc).

    Implementation of lists

    As pointed out in Section 3.1, the measure of a hyperrectangle can be represented by an integer, both for DIRECT and DIRECT-l. In our implementation we store all hyperrectangles with the same measure in a sorted list. The integer representation of the measure of the hyperrectangles is the index to which list a hyperrectangle belongs. In a given list the hyperrectangles are sorted with respect to their function value. This allows us to easily identify the hyperrectangles with lowest function value for each size, and therefore the potentially optimal hyperrectangles can be identified fast, see also lemma 3.2. Since FORTRAN 77 does not support pointers, we needed to implement these sorted lists using arrays. Figures A.1 and A.2 show a graphical impression of an Appendix A. Implementation of the DIRECT and DIRECT-l algorithm 124

    Anchor51 0 0 0 0 0 0 0 000 00

    99

    point 302040000000 00

    81 25 93

    f 2599819328 000000000 28 length(1) 11100000000000 length(2) 11111 000000000

    Figure A.1: Example for the structure to store the lists (Part a).

    Anchor468 0 0 0 0 0 0 000 00

    point 951210 711003 0 0 00

    f 21101112 2 11000

    length(1) 21101222211000 length(2) 21111112211000

    Figure A.2: Example for the structure to store the lists (Part b).

    example. There were two arrays used in this structure (point and anchor) and one index (free). Anchor is an integer array of dimension maxdeep, where maxdeep is the maximum number of divisions allowed. It stores at position i an index to the array point of the hyper rectangle with the lowest function value and measure represented by i.Infree the index of the first unused entry in the list is kept. Figures A.1 and A.2 show an example. We use DIRECT-l in this example, but the same is true for DIRECT. In this example Figure A.1 shows the results of one iteration of DIRECT-l. On the right side of the Figure the division of the unit square is shown. There are two rectangles with level 0, and three with level 1. The level 1 rectangles are colored light grey, as are the corresponding entries into the list point. Of the two level 0 rectangles the one with index 5 has the lowest function value at the center. Therefore, 5 is the value in anchor(0) and point(5) = 4, the index to the other rectangle with level 0. Similarly the level 1 rectangle with lowest function value at its center has index 1; therefore, anchor(1) = 1, meaning the list starts at 1, then Appendix A. Implementation of the DIRECT and DIRECT-l algorithm 125 goes to 3 and ends at 2. Here, the first unused array entry is 6, therefore free =6. Figure A.2 shows the lists after another step of the algorithm. There is only one level 0 rectangle left, therefore the lists end after the first element (symbolized by point(4) = 0). The list of level 1 rectangles is the longest, starting at the rectangle with index 6 and ending at the rectangle with index 10. Again this list is marked with a light grey background. The third list (marked dark grey) consists of the level 2 rectangles and starts with the rectangle with index 8. Again the end of the list is signaled bya0inpoint(9). At the end of the iteration free = 12, since this is the first unused entry. The last list we used is S,amaxdeep times 2 array of integers. We use it to store a list of the points which are potentially optimal. In the first dimension of S we store this list, whereas in the second dimension the integer, which represents the measure of the corresponding hyperrectangles, is stored. The list is filled in the subroutine DIRChoose.

    A.1.2 Details

    Figure A.3 shows a flowchart of the implementation. Following the initialization in DIRInit the main loop begins. The main loop starts with calling DIRChoose to fill S with the potentially optimal hyperrectangles. In each iteration of the inner loop we choose one of these potentially optimal hyperrectangles and detect the directions with maximal side length (that is, the directions in which the hyperrectangle is divided) using DIRGet I. After this we calculate the midpoints of the new hyperrectangles in DIRSamplepoints, followed by the computation of the function value at these new points in DIRSamplef. Using these new function values, we then divide the hyper- rectangles using DIRDivide and then insert them into the list in DIRInsertlist. We then check if any of the termination criteria used are satisfied and terminate if necessary. Appendix A. Implementation of the DIRECT and DIRECT-l algorithm 126

    Start

    DIRInit

    DIRChoose

    Is j a No point to sample

    Yes

    DIRGet_I

    DIRSamplepoints

    DIRSamplef

    DIRDivide

    DIRInsertlist

    More candidates Yes for sampling

    No

    ReplaceInf

    Termination No criteria satisfied

    Yes

    End

    Figure A.3: Flowchart of the algorithm Appendix A. Implementation of the DIRECT and DIRECT-l algorithm 127

    Slave

    Processor 2

    Slave Master Processor 3

    Processor 1

    Slave

    Processor n

    Figure A.4: Diagram of the Master-Slave paradigm

    A.2 Parallel Implementation

    In this section we describe our parallel implementation. In this implementation we parallelized the evaluation of the function in the inner loop (line 2.5.2 in Algorithm 2.4). We did this using both PVM and MPI. Both PVM and MPI are standard portable environments for parallel programming available for a wide variety of com- puting platforms. Our implementation was tested on an IBM SP2 supercomputer at the North Carolina Supercomputing Center. MPI and PVM use the Master-Slave model of parallel programming. One process acts as the controlling instance (“Master”) and the other processes (“Slaves”) do specialized work given to them by the Master. Most of the time each slave process uses its own processor, although there are also implementations where more than one slave process runs on one processor. Figure A.4 shows a schematic of this. On the first processor we run the master program, which controls the slave programs running on processors 2 to n. In our implementation we use the master to run DIRECT. Only when we evaluate Appendix A. Implementation of the DIRECT and DIRECT-l algorithm 128

    Table A.1: Routines for the parallel implementation of DIRECT. Name Function ParDirect Replaces DIRect, see text. DIRSamplef Samples the function for many points in parallel. comminitIF Initialize communication. commexitIF End MPI or PVM. getmytidIF Returns the task id or rank of this process. getnprocsIF Returns the number of processes available. gettidIF Returns if the process is the master or slave. mastersendIF Sends message from master to slave. masterrecvIF Receives message from slave to master. slavesendIF Sends message from slave to master. slaverecvIF Receives message from master to slave.

    the function in the inner loop, the master sends the coordinates of the point at which the function should be evaluated to a slave process. Since in most applications the evaluation of the function is the most time consuming part of the program, we get good parallelism using this idea. In our implementation we also used the master to do function evaluations. We use the routines listed in Table A.1 to hide the specifics of PVM and MPI from our implementation. Also, instead of directly calling the routine DIRect, the user calls the routine ParDirect. This is the only change a user needs to do to move from a single processor system to a multiple processor system. On each processor the same code is run, only inside ParDirect we find out if we run on the master or slave. If we are on the slave, the program waits to receive either a point where to evaluate the function, or the signal to terminate. If the slave receives the coordinates of a point, it evaluates the function at that point, sends the result to the master and waits for the next point. The master calls the same routine DIRect as for single processors. Only the routine DIRSamplef is changed. In it, the master sends the coordinates of the sample points to the slaves. If there are more points than slaves, the master also Appendix A. Implementation of the DIRECT and DIRECT-l algorithm 129 evaluates the function before waiting for the results from the slaves. Once the master receives the result of a function evaluation from a slave, it checks if there are more points that need to be evaluated. If this is the case, the master sends another point to the slave that just finished a function evaluation. If not, the master waits until he has received all results from the slaves and then returns to DIRect. After DIRect terminates, ParDirect sends the signal to stop to all slaves. Once all slaves have received the message, ParDirect returns to the main program.

    A.3 Extensions to problems with hidden constraints

    In the first extension of the DIRECT algorithm we replaced the function value of infeasible points, as described in 3.4, by a large function value (that is 106). In the second extension we keep an extra list of infeasible points. To do this, we needed to change several of the data structures. First we had to add an extra entry at position −1inanchor to keep track of the start of this list. Furthermore we changed f to a maxfunc times 2 dimensional array, where we used the extra dimension to store if the function returned with the message that a given point is feasible (represented by 0) or infeasible (represented by 1). We only needed to change one routine. DIRChoose had to incorporate a test if the midpoint is feasible when comparisons are done to find potentially optimal points. Also if infeasible points were found, we have to include one of them into the list of points to sample. For the third extension we added an extra routine DIRreplaceInf at the end of the main loop, as show in Algorithm 3.1. In this algorithm we check for any infeasible point if there is a feasible point nearby. In this case, we replace the artificial value

    we assigned to it in DIRSamplef by the smallest value fminloc of nearby points plus

    ²|fminloc| as explained in section 3.4.3. We use the extra dimension in f to store if a point is feasible (represented by 0), infeasible, but with a nearby feasible point (represented by 1), or completely infeasible (represented by 2). As noted before, we Appendix A. Implementation of the DIRECT and DIRECT-l algorithm 130 have to check all infeasible points at the end of the main loop, not just the new ones, see the example at the end of section 3.4.3. Appendix B

    Implicit Filtering

    We based the following brief description of Implicit Filtering on [16, 18, 17, 35, 36]. The implementation we used is called IFFCO (Implicit Filtering for Constrained Op- timization). Implicit Filtering was designed to solve Problem P0, where the objective functions f has the following form:

    f(x)=fˆ(x)+φ(x). (B.1)

    In Equation (B.1), fˆ is of simple form. For example, fˆ could be a quadratic. The function φ is a low-amplitude, high-frequency perturbation, referred to as ‘noise’ in [17] and [35]. By low amplitude we mean that

    max |φ(x)|¿max |fˆ(x)|. (B.2) x∈Ω x∈Ω

    Note that φ(x) need not be continuous. Implicit Filtering is very effective if φ(x) decays near local minima of fˆ. In Figure B.1 we show an example of such a function. Implicit Filtering is based on the gradient projection method described in [8]. It uses a sequence of finite difference steps (scales) to approximate the gradient (with central differences). To sped up convergence, Quasi-Newton approximations of the Hessian were used. Furthermore a line search algorithm is used to give the code global capabilities. A brief outline of the algorithm is shown in Algorithm B.1.

    131 Appendix B. Implicit Filtering 132

    1

    0.9

    0.8

    0.7

    0.6

    0.5

    0.4

    0.3

    0.2

    0.1

    0 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

    Figure B.1: Example of a function that is composed of a simple function and noise.

    Algorithm B.1 BriefIFFCO

    1: Pick initial x and h; find f(x) and the Difference Gradient ∇hf(x). 2: Initialize the model Hessian B to the identity. 3: while h and ∇hf(x) satisfy conditions do 4: Use ∇hf(x) and B to calculate a descent direction d. This step is a Quasi- Newton step. 5: Perform a linesearch in the direction d, and signal success if some criteria are met. 6: if linesearch was successful then 7: Accept new point and project into the box Ω. 8: else 9: h = h/2. 10: end if 11: Calculate the Difference Gradient ∇hf(x). 12: Update B with either a rank-one SR1 update, or a rank-two BFGS update. 13: end while Appendix B. Implicit Filtering 133

    A variation of Algorithm B.1 is restarted Implicit Filtering. Here we use the best point found at the end of the while loop as starting point for a second run of Algorithm B.1. That means, we run the same algorithm twice in a row. The approximations to the Hessian are done using two different kinds of approx- imations, SR1 (Symmetric Rank one) and BFGS. The SR1 approximation is given by rrT Sc = S− + , rT s where S− is the previous SR1 approximation to the Hessian,

    s = xc − x−,

    r = y − S−s,

    y = d(xc) − d(x−),

    and x− is the iterate previous to xc, the current iterate. The BFGS approximation uses the same definitions for s and y, and is given by µ ¶ µ ¶ syT syT ssT Bc = I − B− I − + , yT s yT s yT s

    where B− is the previous BFGS approximation. Note that BFGS approximates the inverse of the Hessian, whereas SR1 approximates the Hessian. To accommodate the constraints, reduced approximations are used for SR1 and BFGS. When the i-th constraint is active, the off diagonal elements in the i-th row and column are set to zero. To keep the Quasi-Newton steps within Ω, the following projection is done:    ui, if xi >ui

    P (x)i = x, if l ≤ x ≤ u .  i i i i  li, if xi

  • In the numerical results in section 5.1.2, we report only the results for SR1 update, since it performed best. Appendix C

    DIRECT Version 2.0 User Guide

    This Chapter contains the user guide for version 2.0 of our implementation of DIRECT. Note that this is a document on its own [29], and therefore some of it overlaps with the contents of the main document.

    Preface

    This document is a revision of the 1998 guide [28] to Version 1 of our implementation of DIRECT. The major changes in the code are • inclusion of the original DIRECT algorithm as described by Jones [48], • extension of the algorithm to handle hidden constraints, • and a parallel version both with PVM and MPI calls. The code and documentation can be found at the following WWW-address :

    http://www4.ncsu.edu/eos/users/c/ctkelley/www/optimization codes.html

    The primary contact for DIRECT is

    J. M. Gablonsky Department of Mathematics Center for Research in Scientific Computations

    135 Appendix C. DIRECT Version 2.0 User Guide 136

    North Carolina State University Raleigh, NC 27695-8205 [email protected]

    Electronic mail is preferred. This project was supported by National Science Foundation grants #DMS-9700569, #DMS-9714811, and #DMS-0070641, and an allocation from the North Carolina Su- percomputing Center. Owen Esslinger and Alton Patrick contributed to the parallel version of the code and the test programs.

    C.1 Introduction to DIRECT

    This user guide covers an implementation of both the original DIRECT algorithm and our modification, which we called DIRECT-l. DIRECT is a method to solve global bound constraint optimization problems and was originally developed by Jones et.al. [48]. It has been used in many industrial applications [11, 15, 19, 20, 22, 47]. We will only briefly describe our modifications to DIRECT and how we extended it to handle problems with hidden constraints and point to [30, 31] for further information. After a short introduction in Section C.1 we describe in Section C.2 what is included in the package and how to use our implementation. Section C.3 then gives a short explanation of the algorithm and our modifications. Finally Section C.4 describes several test functions and reports some numerical results.

    C.1.1 Problem description

    DIRECT was developed to solve problems of the following form: Appendix C. DIRECT Version 2.0 User Guide 137

    0 N N N Problem 4 (P ). Let a, b ∈ R , Ω={x∈R :ai ≤xi ≤bi}, and f :Ω→R

    be Lipschitz continuous with constant γ. Find xopt ∈ Ω such that

    ∗ fopt = f(xopt) ≤ f + ², (C.1) where ² is a given small positive constant. Our extension also solve more difficult problems of the following form: Problem 5 (P00). Let B ⊂ Ω and f : B → R be Lipschitz continuous with constant γ.Letf∗ be f ∗ = min f(x). x∈B

    Find xopt ∈ B such that ∗ fopt = f(xopt) ≤ f + ², (C.2) where ² is a given small positive constant. If B is not given analytically, we say that the problem has hidden constraints. Problems with hidden constraints often occur in so called “black-box” optimization problems, where the objective function is given by a computer program. Note that problems of kind P00 and P0 are the same if B =Ω.

    C.2 Using DIRECT

    C.2.1 What is included in the package

    The code and documentation can be found at the following WWW-address :

    http://www4.ncsu.edu/eos/users/c/ctkelley/www/optimization codes.html

    Once you have the file DIRECTv2.0.3.tar.gz, do the following in an UNIX envi- ronment:

    unix> gunzip DIRECTv2.0.3.tar.gz Appendix C. DIRECT Version 2.0 User Guide 138

    unix> tar -xf DIRECTv2.0.3.tar

    If you use a computing environment other than UNIX, you may need to use other programs to uncompress the files. Once you have uncompressed the file, you will have a subdirectory called direct containing the following files:

    DIRect.f The main routine.

    DIRserial.f Special routines for serial version of DIRECT.

    DIRparallel.f Special routines for parallel version of DIRECT.

    DIRsubrout.f Subroutines used in DIRECT.

    main.f Sample program for the serial version of DIRECT. This sample program opti- mizes the test functions described in Section C.4.

    mainparallel.f Sample program for parallel version of DIRECT.

    myfunc.f Sample test functions.

    DIRmpi.f Routines for parallel version of DIRECT using MPI.

    DIRpvm.f Routines for parallel version of DIRECT using PVM.

    makefile Makefile for sample programs (both serial and parallel).

    mpi test.cmd File to run MPI version of parallel code on the IBM SP/2 super computer.

    pvm test.cmd File to run PVM version of parallel code on the IBM SP/2 super computer.

    userguide.ps This document.

    To see if everything works, do the following: Appendix C. DIRECT Version 2.0 User Guide 139

    unix> cd direct

    unix> make

    unix> TestDIRect

    The sample program should solve one of the examples described below in Section C.4. Included in this package is the serial version of DIRECT as well as parallel versions both for the MPI and the PVM parallel programming standards. We used PVM 3.4 calls in the PVM version, and MPI 1.1 calls for the MPI version. We have tested both the PVM and MPI versions on the IBM SP/2 supercomputer at the North Carolina Supercomputer Center (NCSC). The files DIRmpi.f and DIRpvm.f contain interface routines to MPI and PVM, respectively, and were written by Alton Patrick.

    C.2.2 Calling DIRECT

    In this section we describe the calling sequence for DIRECT and explain the arguments following the format as in the user guide for IFFCO by Choi et.al. [17]. Finally, we provide the format for subroutines which the user must supply.

    • Calling sequence Direct(fcn, x, n, eps, maxf, maxT, fmin, l, u, algmethod, Ierror, logfile, fglobal, fglper, volper, sigmaper, iidata, iisize, ddata, idsize, cdata, icsize) • Arguments The arguments are listed in the order they appear in the calling se- quence. • On Entry

    fcn – is the argument containing the name of the user-supplied subroutine that returns values for the function to be mini- mized. fcn must be declared EXTERNAL in the calling program. Appendix C. DIRECT Version 2.0 User Guide 140

    n–Integer. It is the dimension of the problem. If n>64 the parameter maxor in the variable list at the beginning of file DIRect.f must be set to a larger value. maxor is a parameter used to dimension the work arrays used in DIRECT. eps – Double-Precision. It ensures sufficient decrease in func- tion value when a new potentially optimal interval is chosen. It is normally set to 10−4, although lower values should be tried if the results of the optimization are unsatisfactory.

    maxf – Integer. It is an approximate upper bound on the max- imum number of function evaluations. This is only an ap- proximate upper boundary, because the DIRECT algorithm will finish the division of all potentially optimal hyperrect- angles. If it is set to a value higher than 90000, change the parameter Maxfunc at the beginning of file DIRect.f. Max- func is a parameter used to set the dimension of the work arrays used in DIRECT. maxT – Integer. It is the maximum number of iterations. DIRECT will stop before it finishes all iterations when the maximum number of function evaluations is reached earlier. If it is set to a value higher than 600, change the parameter Maxdeep at the beginning of file DIRect.f. Maxdeep is used to set the dimension of the work arrays used in DIRECT. l–Double-Precision array of length n. It defines the lower bounds for the n independent variables. The hypercube defined by the constraints on the variables is mapped to the unit hypercube in DIRECT. DIRECT performs all calculations on points within the unit cube. The final solution is mapped Appendix C. DIRECT Version 2.0 User Guide 141

    back to the original hypercube before being returned to the user.

    u–Double-Precision array of length n. It defines the upper bounds for the n independent variables.

    algmethod – Integer. It defines which method to use. The user can either use the original method as described by Jones et.al. [48] (algmethod = 0) or use our modification (algmethod = 1). See section C.3.

    logfile – File-Handle for the logfile. DIRECT expects this file to be opened and closed by the user outside of DIRECT. We moved this to the outside so the user can add extra in- formations to this file before and after the call to DIRECT.

    fglobal – Double-Precision. Function value of the global opti- mum. If this value is not known (that is, we solve a real problem, not a test problem) set this value to −10100 (or any other very large negative number) and fglper (see be- low) to 0.0.

    fglper – Double-Precision. Terminate the optimization when the percent error satisfies

    fmin − fglobal 100 < fglper. max(1, |fglobal|)

    volper – Terminate the optimization once the volume of a hy-

    perrectangle S with f(c(S)) = fmin is small. By small we mean that the volume of S is less than volper percent of the volume of the original hyperrectangle.

    sigmaper – Terminate the optimization when the measure of

    the hyperrectangle S with f(c(S)) = fmin is less then sigmaper. Appendix C. DIRECT Version 2.0 User Guide 142

    iidata – Integer array of length iisize. This array is passed to the function to be optimized and can be used to transfer data to this function. The contents are not changed by DIRECT. iisize – Integer. Size of array iidata.

    ddata – Double Precision array of length idsize. See iidata.

    idsize – Integer. Size of array ddata.

    cdata – Character array of length icsize. See iidata.

    icsize – Integer. Size of array ddata.

    • On Return

    x–Double Precision array of length n. It is the final point obtained in the optimization process. It should be a good approximation to the global minimum for the function in the hypercube.

    fmin – Double Precision. It is the value of the function at x.

    Ierror – Integer. If Ierror is negative, a fatal error has oc- curred. The values of Ierror are as follows : Fatal errors :

    -1 u(i) <= l(i) for some i.

    -2 maxf is too large. Increase maxfunc.

    -3 Initialization in DIRpreprc failed.

    -4 Error in DIRSamplepoints, that is there was an error in the creation of the sample points.

    -5 Error in DIRSamplef, that is an error occurred while the function was sampled.

    -6 Maximum number of levels has been reached. Increase maxdeep. Appendix C. DIRECT Version 2.0 User Guide 143

    Successful termination :

    1 Number of function evaluations done is larger then maxf.

    2 Number of iterations is equal to maxT.

    3 The best function value found is within fglper of the (known) global optimum, that is

    fmin − fglobal 100 < fglper. max(1, |fglobal|)

    Note that this termination signal only occurs when the global optimal value is known, that is, a test function is optimized.

    4 The volume of the hyperrectangle with the best function value found is below volper percent of the volume of the original hyperrectangle.

    5 The measure of the hyperrectangle with the best function value found is smaller then sigmaper.

    • User-Supplied Functions and Subroutines • The function evaluation subroutine The name of this subroutine is supplied by the user and must be declared EXTERNAL. The function should have the following form (this is taken from the example in file myfunc.f ): subroutine myfunc(x, n, flag, f, iidata, iisize, + ddata, idsize, cdata, icsize)

    implicit none integer n,flag,i double precision x(n) double precision f Appendix C. DIRECT Version 2.0 User Guide 144

    INTEGER iisize, idsize, icsize INTEGER iidata(iisize) Double Precision ddata(idsize) Character*40 cdata(icsize)

    f = 100 do 100, i = 1,n f=f+(x(i)-.3)*(x(i)-.3) 100 continue flag = 0 end

    Set flag to 1 if the function is not defined at point x. The arrays iidata, ddata and cdata can be used to pass data to the function. They are not modified by DIRECT. • DIRInitSpecific This function can be found in DIRserial.f and DIRparallel.f. You can include whatever application-specific initializations you have to do in this subroutine. Most of the time you will not need it.

    C.2.3 Sample main program

    We also included a test program in the package: main.f for serial computers; mainpar- allel.f for parallel computers. The executables are called TestDIRect, TestDIRectmpi and TestDIRectpvm. This program solves 13 test problems which we describe in detail in Section C.4. We also included a Matlab program that runs all these test problems with both the original DIRECT algorithm and our modification DIRECT-l. We use the following directories for this program: Appendix C. DIRECT Version 2.0 User Guide 145

    direct/matlab Directory which contains the matlab program. The files contained in this directory are

    main.m The matlab program to run all test problems.

    counting.m Read in the the results from the run.

    setdirect.m Set the parameters for DIRECT.

    setproblem.m Set which problem to solve.

    writeDIRECT.m Write the initialization file for DIRECT.

    writemain.m Write the initialization file for the main program.

    fileoutput.m Write the results for all test problems into the file results.txt. direct/ini Directory which contains the initialization files. The files contained in this directory are

    DIRECT.ini The file containing the parameters for DIRECT.

    main.ini The file containing the parameters for the main program.

    problems.ini The file containing the names of the initialization files for the different problems. direct/problem Directory which contains the initialization files for the different problems.

    After running main.m, there will be the following extra files in the main directory: results.txt A file with a LATEXtable listing the number of function evaluations needed and the percent error (see Section C.4.5) at the end of the optimization both for DIRECT and DIRECT-l.

    direct.out Log file containing information about the iterations. This file is divided into five main parts, the first and last part are generated by the user. The second part describes the parameters and some general information. The third Appendix C. DIRECT Version 2.0 User Guide 146

    part shows the iteration history, and the fourth part of the file gives a short summary. We now look at the structure of this file.

    User data – Data written by the main program.

    General information – This part first shows the version of DIRECT. In the next line we output the string stored in cdata(1) as the problem name. Following this we show the values of the parameters passed to DIRECT, including the bounds on the variables. We also print out if the original DIRECT algorithm or our modification is used.

    Iteration history – The middle part of this file contains the iteration history. The first column contains the iteration in which DIRECT found a smaller function value than the best one known so far. The second column con- tains the number of function evaluations done so far, and the last column contains the best function value found. The last line of this part describes the reason why DIRECT stopped.

    Summary – In the final part of this file we write out the lowest function value found, the total number of function evaluations, and how close the best function value DIRECT found is to the global minimal value, if this value is known. Furthermore we give the coordinates of the best point found and by how much these coordinates differ from the upper and lower bounds.

    User data – Additional data written by the main program.

    Below we show an example for this file created by the sample program included in the package: User data

    +------+ | Example Program for DIRECT | | This program uses DIRECT to optimize | Appendix C. DIRECT Version 2.0 User Guide 147

    | testfunctions. Which testfunction is | | optimized and what parameters are used | | is controlled by the files in ini/. | || | Owen Esslinger, Joerg Gablonsky, | | Alton Patrick | | 04/15/2001 | +------+ Name of ini-directory : ini/ Name of DIRect.ini file : DIRECT.ini Name of problemdata file : shekel5.ini Testproblem used : 5

    General information

    ------Log file ------DIRECT Version 2.0.3 Shekel-5 function Problem Dimension n : 4 Eps value : 0.1000E-03 Maximum number of f-evaluations (maxf) : 20000 Maximum number of iterations (MaxT) : 6000 Value of f_global : -0.1015E+02 Global percentage wanted : 0.1000E-01 Volume percentage wanted : -0.1000E+01 Measure percentage wanted : -0.1000E+01 Epsilon is constant. Jones original DIRECT algorithm is used. Bounds on variable x 1 : 0.00000 <= xi <= 10.00000 Appendix C. DIRECT Version 2.0 User Guide 148

    Bounds on variable x 2 : 0.00000 <= xi <= 10.00000 Bounds on variable x 3 : 0.00000 <= xi <= 10.00000 Bounds on variable x 4 : 0.00000 <= xi <= 10.00000

    Iteration history

    ------Iteration # of f-eval. fmin 1 9 -0.5753514094 3 43 -0.6989272350 4 51 -1.0519854213 5 57 -6.8404676192 7 81 -7.4383120011 8 91 -8.1524902009 9 99 -9.0180871080 10 103 -10.0934485966 12 129 -10.1082368755 13 143 -10.1230718067 14 151 -10.1376865940 15 155 -10.1523498373 DIRECT stopped: fmin within fglper of global minimum.

    Summary

    ------Summary ------Final function value : -10.1523498 Number of function evaluations : 155 Final function value is within 0.00837 percent of global optimum. Index Final solution x(i) - l(i) u(i) - x(i) Appendix C. DIRECT Version 2.0 User Guide 149

    1 3.9986283 3.9986283 6.0013717 2 3.9986283 3.9986283 6.0013717 3 3.9986283 3.9986283 6.0013717 4 3.9986283 3.9986283 6.0013717 ------

    User data

    ------Final result ------DIRECT termination flag : 3 DIRECT minimal point : 3.9986283 3.9986283 3.9986283 3.9986283 DIRECT minimal value : -10.1523498 DIRECT number of f-eval : 155 Time needed : 0.3000E-01 seconds.

    C.3 A short overview of the DIRECT algorithm and our modifications

    In this section we give a mathematical description of the original DIRECT algorithm, which was developed by D. R. Jones, C. D. Perttunen and B. E. Stuckman [48] in 1993, and our modifications to it. The name DIRECT is derived from one of its main features, dividing rect angles. There are two main ingredients to this algorithm. The first is how to divide the domain (Section C.3.1), and the second ingredient is how to decide which hyperrectangles we divide in the next iteration (Section C.3.2). Appendix C. DIRECT Version 2.0 User Guide 150

    C.3.1 Dividing the domain

    Division is based on N-dimensional trisection. Sections C.3.1 and C.3.1 describe how this division is done for a hypercube and a hyperrectangle , respectively.

    Dividing of a hypercube

    Let c be the center point of a hypercube. The algorithm evaluates the function at the points c § δei, where δ equals 1/3 of the side length of the cube and ei is the i-th

    Euclidean base vector. DIRECT defines wi by

    wi = min{f(c + δei),f(c−δei)}.

    The algorithm then divides the hypercube in the order given by the wi, starting

    with the lowest wi. DIRECT divides the hypercube first perpendicular to the the

    direction with the lowest wi. Then it divides the remaining volume perpendicular to the direction of the second lowest wi and so on until the hypercube is divided in all directions. This strategy puts c in the center of a hypercube with side length δ. Let b = arg mini=1,...,N {f(c + δei),f(c−δei)}. b will be the center of a hyperrectangle with one side with a length of δ, the other N − 1 sides will have a length of 3δ. Figure C.1a shows an example of the division of a hypercube. Here

    w1 = min{5, 8} =5

    w2 = min{6, 2} =2.

    Therefore we divide first perpendicular to the x2-axis, and then in the second step

    the remaining rectangle is divided perpendicular to the x1-axis. Appendix C. DIRECT Version 2.0 User Guide 151

    Dividing of a hyperrectangle

    In DIRECT a hyperrectangle is only divided along its longest sides, which assures us that we get a decrease in the maximal side length of the hyperrectangle. Figure C.1b represents the next step in the algorithm. DIRECT will divide the shaded area (We explain in Section C.3.2 how we choose which hyperrectangles to divide). The second box in Figure C.1b shows where DIRECT samples the function, and the third box shows how the rectangle is only divided once. Figure C.1c shows the third step in the algorithm for this example. In this step DIRECT will divide two rectangles, which are shaded. One of them is a square, there- fore it is divided twice as described before. The larger area is again a rectangle and gets divided once.

    C.3.2 Potentially optimal hyperrectangles

    This section describes the second main ingredient for the DIRECT algorithm, how to decide which hyperrectangles to divide in the next iteration. DIRECT divides all potentially optimal hyperrectangles as defined in definition C.1.

    Definition C.1. Let ²>0be a positive constant and let fmin be the current best function value. A hyperrectangle j is said to be potentially optimal if there exists some K>˜ 0such that

    f(cj) − Kd˜ j ≤ f(ci) − Kd˜ i, ∀i, and

    f(cj) − Kd˜ j ≤ fmin − ²|fmin|.

    In this definition cj is the center of the hyperrectangle j, and dj is a measure for

    this hyperrectangle. Jones et.al. [48] chose di to be the distance from the center of hyperrectangle i to its vertices. They divide all potentially optimal hyperrectangles in every iteration, even if two of them have the same measure and the same function Appendix C. DIRECT Version 2.0 User Guide 152

    66

    a 9959858

    22

    6 66

    b 5 998 5985 8

    2 362362

    6 769769

    c 59 8 598 598

    3 3 362 362 362 51 51 4 4

    Figure C.1: Dividing of a hypercube. Appendix C. DIRECT Version 2.0 User Guide 153

    value at the center.

    C.3.3 The DIRECT algorithm

    We give a formal description of the DIRECT algorithm in algorithm C.1.

    Algorithm C.1 DIRECT(a, b, f, ², numit, numfunc)

    1: Normalize the search space to be the unit hypercube with center point c1 2: Evaluate f(c1),fmin = f(c1),t=0,m=1 3: while t < numit and m < numfunc do 4: Identify the set S of potentially optimal hyperrectangles 5: while S =6 ∅ do 6: Take j ∈ S 7: Sample new points, evaluate f at the new points and divide the hyperrect- angle with Divide 8: Update fmin,m=m+∆m 9: Set S = S \{j} 10: end while 11: t = t +1 12: end while

    The first two steps in the algorithm are the initialization steps. The variable m is a counter for the number of function evaluations done while t is a counter for the number of iterations. Unlike more traditional optimization methods, there is no termination criteria based on the function for DIRECT. Instead DIRECT stops after numit iterations or after numfunc function evaluations. Note that the limit on the number of function evaluations is not strictly enforced. We only check for this condition after we have divided all potentially optimal hyperrectangles in an iteration. This means we normally do a few more function evaluations than numfunc. Note that in algorithm C.1 there are two possibilities of parallelism here. These are the inner loop (steps 5 to 10) and the function evaluations inside the inner loop (step 8). In our parallel implementation only the inner loop is parallelized. Potentially optimal intervals are identified by DIRECT using the following Lemma to reformulate Definition C.1. Appendix C. DIRECT Version 2.0 User Guide 154

    Lemma C.1. Let ²>0be a positive constant and let fmin be the current best function value. Let I be the set of all indices of all intervals existing. Let I1 = {i ∈

    I : di dj}and I3 = {i ∈ I : di = dj}. Interval j ∈ I is potentially optimal if

    f(cj) ≤ f(ci), ∀i ∈ I3, (C.3) there exists K>˜ 0such that

    f(c ) − f(c ) f(c ) − f(c ) max j i ≤ K˜ ≤ min i j , (C.4) ∈ ∈ i I1 dj − di i I2 di − dj and fmin − f(cj) dj f(ci) − f(cj) ² ≤ + min ,fmin =06 , (C.5) ∈ |fmin| |fmin| i I2 di − dj or f(ci) − f(cj) f(cj) ≤ dj min ,fmin =0. (C.6) ∈ i I2 di − dj The proof of this lemma can be found in [30].

    C.3.4 Our modification to the DIRECT algorithm

    In our modification we use the length of the longest side of a hyperrectangle as the measure dj. This reduces the number of different groups of hyperrectangles compared to using the distance from the center to a corner, and makes the algorithm more biased towards local search. The second modification we did was to divide at most one hyperrectangle per group. That is, if there is more than one hyperrectangle with the same measure, we divide only one of them, instead of all. This again can lead to an improvement of the performance of the algorithm. Both these modifications together result in DIRECT-l. Appendix C. DIRECT Version 2.0 User Guide 155

    C.3.5 Extensions to the DIRECT algorithm

    We also extended the algorithm to handle problems with hidden constraints. That means we look at Problems P00 where the subset B ⊂ Ω is not given analytically. If no feasible point is found within the budget given to DIRECT we allow it to continue until a feasible point is found and then reassign the original budget. Through this strategy we assure that DIRECT does not terminate without finding a feasible point. The strategy we use was suggested by R. Carter [11]. We describe the general idea before going into details. For any infeasible midpoint, we expand its hyperrectangle by a factor of two. If this larger hyperrectangle contains one or more feasible midpoints of other hyperrect- angles, find the smallest function value of these, fmin. Then use fmin + ²|fmin| as a surrogate value. If no feasible midpoint is contained in the larger hyperrectangle, mark the current point as really infeasible.

    Algorithm C.2 DIRECT(a, b, f, ², numit, numfunc)

    1: Normalize the search space to be the unit hypercube with center point c1 2: Evaluate f(c1),fmin = f(c1),t=0,m=1 3: while t < numit and m < numfunc do 4: Identify the set S of potentially optimal hyperrectangles 5: while S =6 ∅ do 6: Take j ∈ S 7: Sample new points, evaluate f at the new points and divide the hyper- rectangle with Divide 8: Update fmin,m=m+∆m 9: Set S = S \{j} 10: end while 11: Use ReplaceInf to check for infeasible points which are near feasible points and to replace the value at these by the value of a nearby point. 12: t = t +1 13: end while

    We will now describe this strategy in more details. We extend DIRECT as shown in Algorithm C.2 by adding a call to ReplaceInf in line C.3.5. In this method the actual replacement takes place. Appendix C. DIRECT Version 2.0 User Guide 156

    Algorithm C.3 ReplaceInf({ci}, {li}, {fi}) Input : N •{ci}- centers of hyperrectangles created by DIRECT, ci ∈ R . N •{li}- side lengths of the hyperrectangles created by DIRECT, li ∈ R . •{fi}- function values at centers of hyperrectangles created by DIRECT, fi ∈ R. Output : •{fi}- updated function values.

    1: for all ci infeasible do 2: Create larger hyperrectangle D around ci. 3: { ∞} F = min mincj ∈D fj, 4: if F<∞ then −6 5: fi = F +10 |F| 6: else 7: mark fi really infeasible. 8: end if 9: end for

    In method ReplaceInf, shown in Algorithm C.3, we iterate over all hyperrectan- gles with infeasible midpoints. For each of these midpoints, we create a new surround- ing box by doubling the length of each side while keeping the same center. Then we

    find fminloc, which is the minimum value of all feasible points calculated by DIRECT inside this expanded hyperrectangle . If this minimum exists (that is, there is at least

    one feasible point in the larger hyperrectangle) we assign fminloc + ²|fminloc| to the current infeasible point. We used a value of ² =10−6 in our computations. Otherwise the infeasible point is marked really infeasible. We assign the maximum value found so far, increased by 1, to really infeasible points. Since we have to check each infeasible point in ReplaceInf (see below) there is no extra cost if the maximum increases. We increase the replacement value to make sure that if there is another hyperrect- angle with the same measure, feasible midpoint, and the same function value (this could be one we used to calculate the minimum), the one with feasible midpoint is divided first. Figure C.2 shows an example of this strategy. We show a close-up of the area Appendix C. DIRECT Version 2.0 User Guide 157

    P Inf 14

    Inf Inf 12 20

    18 19

    Inf 10 22

    Figure C.2: Example of an infeasible point and how its surrogate value is calculated. around an infeasible point P and its corresponding hyperrectangle. The dotted rect- angle is the enlarged rectangle around P . There are nine midpoints of rectangles created by DIRECT contained in this enlarged rectangle; three of them are infeasible. Note that we look at the closed rectangle, therefore the points on the boundary are also considered nearby. Therefore, we only need to take the minimum over the other six feasible midpoints. This value is given by 10; therefore we take 10 + ²|10| as the new value at P . If we would not amplify the surrogate value, the rectangle with center P would look (to DIRECT) the same as the rectangle with a function value of 10 at the midpoint. Therefore, we would have to ensure that DIRECT chooses the rectangle with feasible midpoint and not the one with midpoint P . We avoid this problem by using the amplified value. Note that this surrogate value at the midpoint can change in each outer iteration. There are two reasons why this could happen. • A new point inside the box with a lower function value than the one assigned so far has been found. • The hyperrectangle corresponding to the infeasible point was divided by DIRECT. Through this division DIRECT has made the hyperrectangle Appendix C. DIRECT Version 2.0 User Guide 158

    16 16

    Inf Inf Inf P P Inf Inf Inf 15 18 Inf Inf Inf 15 18

    Inf Inf Inf

    13 10 12 13 10 12

    14 15 16 14 15 16

    Figure C.3: Example of an infeasible point, whose value was replaced by the value of a feasible point nearby, becoming completely infeasible.

    smaller and no more feasible point is nearby (that is in the area looked at). Figure C.3 shows an example of this. On the left, we show a close-up of the hyperrectangles and midpoints created by DIRECT before the call to ReplaceInf. We show the enlarged rectangle around P . There are eight points inside this enlarged rectangle (including the boundary). Five of these are feasible points; therefore, we assign the surrogate value of 10 + ²|10| to the point P . On the right of Figure C.3 we show the same area after DIRECT has divided the rectangle corresponding to P . The two newly created rectangles have infeasible midpoints. This time the enlarged rectangle around P does not contain any feasible points (at least DIRECT has not found any). Therefore P is now marked as really infeasible.

    C.4 The Test Problems

    We first give short descriptions of all the test functions we looked at. Following the descriptions we summarize the important features of the functions and describe (shortly) our numerical results. These can be redone easily with the package provided. Appendix C. DIRECT Version 2.0 User Guide 159

    C.4.1 Elementary functions

    The first three functions we look at are examples of constant, linear and quadratic functions. Looking at the behavior of DIRECT for these functions allows us to get a better understanding of DIRECT and shows the differences between the original algorithm and our modification. The functions are

    Constant function

    f(x) = 100, Ω=[0,1]N .

    In the example program we set N =2.

    Linear function XN N f(x)=2x1+ xi, Ω=[0,1] . i=2 In the example program we set N = 2. The optimal point x∗ =(0,...,0)T has an optimal function value of f ∗ =0.

    Quadratic function XN 2 N f(x) = 10 + (xi − 5.3) , Ω=[0,10] . i=1 In the example program we set N = 2. The optimal point x∗ =(5.3,...,5.3)T has an optimal function value of f ∗ = 10.

    C.4.2 Example for hidden constraints

    The following test function comes from Jones [47]. It was originally given in Gomez et.al. [37]. Appendix C. DIRECT Version 2.0 User Guide 160

    Gomez function number 3 Gomez function number 3 1 4

    0.8 3.5 5 0.6 3 4 0.4 2.5 3 0.2 2 2 2

    x 0 1.5 1

    −0.2 1 0

    −0.4 0.5 −1 1 −0.6 0 0.5 1 0.5 −0.8 −0.5 0 0 −0.5 −0.5 −1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 x −1 −1 x 2 x 1 1

    Figure C.4: Contour plot of the Figure C.5: Plot of the Gomez Gomez # 3 function. # 3 function.

    Gomez #3 [37]

    µ ¶ 4 2 x1 2 2 2 f(x)= 4−2.1x+ x+x1x2+4(x −1)x , 1 3 1 2 2 Ω=[−1,1]2,

    2 B =Ω∩{x∈R |−sin(4πx1) + 2 sin(2πx2) ≤ 0}

    We used this problem as if we did not know the nonlinear constraints. Whenever the nonlinear constraint was not satisfied, we returned flag = 1. The function has a minimum at x∗ =(0.109, −0.623)T with a function value of f ∗ = −0.9711. In figure C.4 we show a contour plot of this function. It is clear that the domain of this function consists of several disconnected regions. We show a plot of the function in figure C.5.

    C.4.3 Test functions described in Jones et.al. [48]

    Jones et al. [48] describe results for their original implementation of DIRECT on nine different test problems. The first seven problems were originally given by Dixon and Szeg¨o [27]. These problems have been widely used to compare global optimization algorithms [27, 45, 46]. Problems eight and nine come from Yao [75]. We now describe Appendix C. DIRECT Version 2.0 User Guide 161

    Table C.1: Parameters for the Shekel’s family of functions T i ai ci 1 4.0 4.0 4.0 4.0 .1 2 1.0 1.0 1.0 1.0 .2 3 8.0 8.0 8.0 8.0 .2 4 6.0 6.0 6.0 6.0 .4 5 3.0 7.0 3.0 7.0 .4 6 2.0 9.0 2.0 9.0 .6 7 5.0 5.0 3.0 3.0 .3 8 8.0 1.0 8.0 1.0 .7 9 6.0 2.0 6.0 2.0 .5 10 7.0 3.6 7.0 3.6 .5

    these test problems in more detail.

    Shekel’s family (S5,S7,S10) [27] Xm 1 f(x)=− ,x,a ∈RN,c >0,∀i=1,...,m, (x − a )T (x − a )+c i i i=1 i i i Ω=[0,10]N .

    Three instances of the Shekel function are used in the comparisons. Here N =

    4,m=5,7 and 10. The values of ai and ci are given in Table C.1.

    Hartman’s family (H3, H6) [27] Ã ! Xm XN 2 N f(x)=− ci exp − aij(xj − pij) ,x,ai,pi ∈R ,ci >0,∀i=1,...,m, i=1 j=1

    Ω=[0,1]N .

    We will look at two instances of the Hartman function. The values of the param- eters and the dimensions of the problems are given in Table C.2. Appendix C. DIRECT Version 2.0 User Guide 162

    Table C.2: Parameters for the Hartman’s family of functions First case : N =3,m=4. i ai ci pi 1 3. 10. 30. 1. 0.3689 0.1170 0.2673 2 .1 10. 35. 1.2 0.4699 0.4387 0.7470 3 3. 10. 30. 1. 0.1091 0.8732 0.5547 4 .1 10. 35. 3.2 0.03815 0.5743 0.8828

    Second case : N =6,m=4. i ai ci 1 10. 3. 17. 3.5 1.7 8. 1. 2 .05 10. 17. .1 8. 14 1.2 3 3. 3.5 1.7 10. 17. 8. 3. 4 17. 8. .05 10. .1 14. 3.2

    i pi 1 0.1312 0.1696 0.5569 0.0124 0.8283 0.5886 2 0.2329 0.4135 0.8307 0.3736 0.1004 0.9991 3 0.2348 0.1451 0.3522 0.2883 0.3047 0.6650 4 0.4047 0.8828 0.8732 0.5743 0.1091 0.0381 Appendix C. DIRECT Version 2.0 User Guide 163

    Branin function

    350

    300

    250

    200

    150

    100

    50

    0 15

    10 10 5 5 0

    x 0 −5 2 x 1

    Figure C.6: Plot of the Branin function.

    Branin function (BR) [27]

    5.12 5 2 1 f(x1,x2)=(x−2− x+ x1−6) + 10(1 − ) cos x1 +10, 4π2 1 π 8π Ω=[−5,10] × [0, 15].

    This function has three global minima. Figure C.6 shows a plot of this function.

    Goldstein and Price function (GP) [27]

    £ ¤ 2 2 2 f(x1,x2)= 1+(x1 +x2 +1) (19 − 14x1 +3x1 −14x2 +6x1x2 +3x2) £ ¤ 2 2 2 30 + (2x1 − 3x2) (18 − 32x1 +12x1 +48x2 −36x1x2 +27x2) ,

    Ω=[−2,2]2.

    The function has four local minima and one global minimum at x∗ =(0,−1)T with f(x∗) = 3. In figures C.7 and C.8 we show plots of this function. The first figure Appendix C. DIRECT Version 2.0 User Guide 164

    Goldstein and Price function

    8 Goldstein and Price function x 10

    9

    10 8 x 10 7 3 6

    2.5 5

    4 2 3

    1.5 2

    1 1 0 0 0.5 −0.5 1 0 0.5 −1 2 0 −1.5 1 2 −0.5 1 0 x −2 −1 2 x 0 1 −1 −1 x −2 −2 2 x 1 Figure C.8: Plot of the Gold- Figure C.7: Plot of the Gold- stein and Price function around stein and Price function. the global minimum.

    shows the whole domain, while the second figure shows only the area around the global minimum.

    Six-hump camelback function (C6) [75]

    2 4 2 2 2 f(x1,x2)=(4−2.1x1+x1/3)x1 + x1x2 +(−4+4x2)x2,

    Ω=[−3,3] × [−2, 2].

    The function has six minima, two of which are global. The global minima are located at x∗ =(§0.0898, ∓0.7126)T and f(x∗)=−1.0316. Figures C.9 and C.10 show plots of this function. Again, the first figure shows the whole domain, while the second shows an enlargement around the global minima.

    Two-dimensional Shubert function (SH) [75]

    Ã !Ã ! X5 X5 f(x1,x2)= j cos[(j +1)x1 +j] j cos[(j +1)x2 +j] , j=1 j=1 Ω=[−10, 10]2. Appendix C. DIRECT Version 2.0 User Guide 165

    Six−hump camelback function

    Six−hump camelback function

    4

    3 200 2 150 1

    100 0

    50 −1

    0 −2 1

    0.5 1 −50 0.5 2 0 0 1 3 −0.5 2 −0.5 0 1 x −1 −1 2 x 0 1 −1 −1 −2 x −2 −3 2 x 1 Figure C.10: Plot of the Figure C.9: Plot of the Six-hump camelback function Six-hump camelback function. around the global minima.

    Two−dimensional Shubert function Two−dimensional Shubert function

    250 250 200 200 150 150 100 100 50 50 0 0 −50 −50 −100 −100 −150 −150 −200 −200 2 10 1 2 5 10 1 5 0 0 0 0 −1 −5 −1 −5 x −2 −2 x −10 −10 2 x 2 x 1 1

    Figure C.11: Plot of the Figure C.12: Plot of the two-dimensional Shubert func- two-dimensional Shubert function tion. around a global minimum.

    The function has 760 local minima, of which 18 are global. Figures C.11 and C.12 show two plots of this function. Again, the first figure shows the whole domain, while the second shows an enlargement around one of the global minima. Table C.3 summarizes the important features of the test problems used in our numerical experiments.

    C.4.4 Main features of the Test Functions

    In table C.3 we give the box constraints, the number of global minima and the function values at the global minima for the test functions described above. Appendix C. DIRECT Version 2.0 User Guide 166

    Table C.3: Summary of the important features of the test functions. # Name N Ω Global minima number function value 1 Constant 2 [0, 1]2 ∞ 100 2 Linear 2 [0, 1]2 10 3 Quadratic 2 [0, 10]2 110 4 Gomez 3 2 [−1, 1]2 1 -0.981243 5 Branin (BR) 2 [−5, 10] × [0, 15] 3 0.398 6 Shekel-5 (S5) 4 [0, 10]4 1 -10.153 7 Shekel-7 (S7) 4 [0, 10]4 1 -10.403 8 Shekel-10 (S10) 4 [0, 10]4 1 -10.536 9 Hartman-3 (H3) 3 [0, 1]3 1 -3.863 10 Hartman-6 (H6) 6 [0, 1]6 1 -3.322 11 Goldprice (GP) 2 [−2, 2]2 1 3.000 12 Sixhump (C6) 2 [−3, 3] × [−2, 2] 2 -1.032 13 Shubert (SH) 2 [−10, 10]2 18 -186.831

    C.4.5 Numerical results

    Table C.4 reports the results for DIRECT and DIRECT-l using the termination criteria from Jones et al. [48]. This termination criteria uses knowledge of the global minimum value to terminate once the percentage error is small. For this we first define the percentage error.

    Let fglobal be the known global minimal function value and denote by fmin the best function value found by DIRECT. We define the percentage error p as   fmin−fglobal 100 ,fglobal =06 , |fglobal| p =  100fmin,fglobal =0.

    Following Jones, we terminate the iteration once p is lower than 0.01. In all runs we set ² =0.0001 for both the original implementation and our implementation. Although both methods can solve this problem, our modification consistently requires fewer function evaluations, significantly fewer for problems 2 through 3 and Appendix C. DIRECT Version 2.0 User Guide 167

    Table C.4: Results for the test functions. # Problem DIRECT DIRECT-l f.eval. p f.eval. p 1 Constant 9 0.00E+00 7 0.00E+00 2 Linear 475 0.76E-02 173 0.76E-02 3 Quadratic 139 0.29E-02 65 0.29E-02 4 Gomez 3 771 0.35E-03 745 0.35E-03 5 Branin 195 0.98E-03 159 0.98E-03 6 Shekel-5 155 0.84E-02 147 0.84E-02 7 Shekel-7 145 0.94E-02 141 0.94E-02 8 Shekel-10 145 0.97E-02 139 0.97E-02 9 Hartman 199 0.85E-02 111 0.85E-02 10 Hartman 571 0.89E-02 295 0.89E-02 11 Goldstein-Price 191 0.30E-02 115 0.30E-02 12 Sixhump camel back 285 0.48E-03 191 0.48E-03 13 Shubert 2967 0.50E-02 2043 0.50E-02

    9 through 13. This means the algorithm terminated with Ierror =3. These result show that DIRECT-l should be used for lower dimensional problems which do not have too many local and global minima.