Quick viewing(Text Mode)

Effects of Latin Hypercube Sampling on Surrogate Modeling and Optimization

Effects of Latin Hypercube Sampling on Surrogate Modeling and Optimization

International Journal of Fluid Machinery and Systems DOI: http://dx.doi.org/10.5293/IJFMS.2017.10.3.240 Vol. 10, No. 3, July-September 2017 ISSN (Online): 1882--9554

Original Paper

Effects of Latin hypercube on surrogate modeling and optimization

Arshad Afzal1, Kwang-Yong Kim2 and Jae-won Seo3

1Department of Mechanical Engineering, Indian Institute of Technology Kanpur, Kanpur 208016, India, [email protected] 2,3Department of Mechanical Engineering, Inha University 253 Yonghyun-Dong, Incheon, 402-751, Republic of Korea

Abstract

Latin hypercube sampling is widely used design-of- technique to select design points for simulation which are then used to construct a surrogate model. The exploration/exploitation properties of surrogate models depend on the size and distribution of design points in the chosen design space. The present study aimed at evaluating the performance characteristics of various surrogate models depending on the Latin hypercube sampling (LHS) procedure (sample size and spatial distribution) for a diverse set of optimization problems. The analysis was carried out for two types of problems: (1) thermal-fluid design problems (optimizations of convergent–divergent micromixer coupled with pulsatile flow and boot- shaped ribs), and (2) analytical test functions (six-hump camel back, Branin-Hoo, Hartman 3, and Hartman 6 functions). The three surrogate models, namely, response surface approximation, , and radial basis neural networks were tested. The important findings are illustrated using Box-plots. The surrogate models were analyzed in terms of global exploration (accuracy over the domain space) and local exploitation (ease of finding the global optimum point). Radial basis neural networks showed the best overall performance in global exploration characteristics as well as tendency to find the approximate optimal solution for the majority of tested problems. To build a surrogate model, it is recommended to use an initial sample size equal to 15 times the number of design variables. The study will provide useful guidelines on the effect of initial sample size and distribution on surrogate construction and subsequent optimization using LHS sampling plan.

Keywords: Latin hypercube sampling, Optimization, Surrogate model, Cross-validation, Global Optimization.

1. Introduction Design optimization has been widely applied in many engineering fields [1-8]. Optimization problems can be classified as either single-objective (a unique global optimum solution) or multi-objective (non-dominated solutions/Pareto-optimal set) problems. Optimization algorithms generally require a large number of function evaluations to yield the optimum design(s). Thus, in many applications, design optimization with high-fidelity analysis is often impractical because of large computational costs; for example, in the field of thermo-fluid engineering, where the non-linear Navier-Stokes equations need to be solved, such an analysis usually requires much computing time. In this respect, surrogate modeling is an effective tool for reducing the computational time by approximating the objective function(s) based on numerical simulations for a limited number of designs. The surrogate-based optimization has been used by many researchers, especially in the field of thermo-fluid engineering in a recent couple of decades [1-6]. A detailed review of surrogate modeling techniques, and their applications can be found in the reviews of Queipo et al. [9], Forrester and Keane [10], Samad and Kim [6]. Barthelemy and Haftka [11] also reviewed the applications of surrogate modeling in structural optimization problems. Samad et al. [5] investigated comparatively the predictive capabilities of different surrogate models such as response surface approximation (RSA), Kriging (KRG), and radial basis neural networks (RBNN) models for optimizing a transonic axial compressor blade. Afzal and Kim [12-15] used RSA, KRG, and RBNN for function approximations in single- and multi- objective optimizations for the shape optimization of micromixer designs. Besides optimization, other aspects of surrogate models were also studied: Jin et al. [16] compared different surrogate models in terms of their accuracy, robustness, and in approximating mathematical functions. Simpson et al. [17] surveyed the various metamodeling techniques to approximate deterministic computer analysis codes and provided recommendations for their

Received August 30 2016; revised January 19 2017; accepted for publication February 23 2017: Review conducted by Minsuk Choi. (Paper number O16028C) Corresponding author: Arshad Afzal, [email protected]

240 appropriate use. These studies revealed the discrepancies among different surrogates in solving a design problem; no single surrogate was found to be the most effective for all problems. To build a surrogate model, a large number of numerical simulations are required to ensure an acceptably accurate surrogate model. The design-of-experiment (DOE) procedure can be used to economically construct a surrogate model with the minimum number of numerical simulations. Several DOE methods can be used to select the design sites for the simulation; full factorial, fractional factorial, , Latin hypercube sampling (LHS), Quasi-Monte Carlo (QMC), etc., are commonly used. For some DOE procedures such as full factorial, fractional factorial, and central composite design, the sample size is fixed for a given number of factors (i.e., design variables), and it increases with the number of factors; this may lead to unmanageable sample size. On the other hand, LHS and QMC allow flexibility in determining the sample size. Upon a careful review of the literature, it was found that there are two approaches to determine the right DOE samples for surrogate model construction. The first approach is known as “single-stage sampling,” which generates the sample points at once for constructing a surrogate. This method is also known as “one-shot” approach. The other method is “adaptive sampling” where additional points are added in several stages to repeatedly update the surrogate model until a desired level of accuracy is obtained. Efficient global optimization (EGO) has been widely used by many researchers to obtain a reasonable function approximation with a balance in local exploitation and global exploration properties. However, most of the researches were limited for Kriging-based optimization [18-21]. Regis [18] used trust- region-like approach for Kriging-based optimization that selects the additional new points by maximizing the expected improvement function over a small region in the vicinity of the current best solution instead of the entire domain. Wang et al. [19] performed a comparative study of expected improvement (EI)-assisted optimization with different surrogates viz. KRG, RBNN, support vector regression (SVR) and linear Shepard (SHEP) on numerical test problems. The optimization starts with a fixed initial sample size equal to 3D (D = number of design variables) and 10D for small-scale and large-scale test problems, respectively. The EI is used to guide the selection of the next sampling candidate point. As per their test, the Kriging EGO was found to be the most robust method. Irrespective of the different approaches, the surrogate model always depends on the initial number and distribution of the sample points. Although some previous studies [22-23] looked into this matter, the question, “what are the optimum number and distribution of initial sample points in surrogate model construction?” was not clearly answered. Therefore, considering the increasing applications of surrogate-based optimization in engineering problems, it is important to investigate the effects of the number and distribution of the DOE sample points in the design space on the accuracy of the available surrogate models. Among the various DOE procedures, the LHS has been used widely by many researchers owing to its better space-filling property, sample flexibility, and low sample sizes. However, the accuracy of the surrogate model based on LHS depends on the number of sample points, which is selected by the designer, and their distribution in the design variable space. The distributions of sample points are different for different runs of the LHS procedure. In to previous researches [18-19, 23] which were focused on the addition of infill sample points assisted by the EI or similar strategies, the main concern of the present research is to study the effect of initial sample size and distribution in LHS procedure on surrogate modeling and optimization. Thus, the main objective of the present study is to investigate the combined effect of the number and distribution of LHS points on the accuracy of the surrogate model for different problems and for different surrogate models, namely RSA, KRG, and RBNN. The performances of the surrogate models were evaluated in terms of the accuracy and efficiency of finding the global optimum point.

2. Latin Hypercube Sampling LHS [24-25] is used to select design (sample) points in the continuous design space, bounded by the lower and upper bounds of the design variables. This approach generates random sample points, ensuring that all portions of the design space are represented. Using McKay et al.’s notation [24], a sample of size N can be constructed by dividing the of each factor (input variable) into N strata of equal marginal probability 1/N and sampling once from each stratum. Further, the uniformity of the sampling plan can be controlled using uniformity measures such as the maximum-minimum distance between the points, or by correlating the sample data. In the present study, the MATLAB function, lhsdesign, was used to obtain the LHS points with the criterion maximin (maximize the minimum distance between points). It proceeds by iteratively generating a number of LHS samples at random and chooses the best one based on the criterion ‘maximin’. To generate the experimental design (sample), the number of sample points is specified by the designer. Since the samples are determined using probability functions (random process), the state of the random generation of sample changes with every subsequent run. Therefore, even with a fixed number of points, the distributions of sample points in design space changes with every subsequent run.

3. Surrogate Models The various surrogate modeling techniques used in present study are described below:

3.1 Response surface approximation

The RSA method [26] is a methodology of fitting a function for a sampled data set comprising design variables and a response, signifying the relationship between response functions and design variables. For N independent observations, x1, x2,..., xN of design variables and response y, the model can be represented as follows: For N independent observations, x1, x2,..., xN of design variables and response y, the model can be represented as follows:  =  +  (1)      ( ) . .  ( )   .     ......  =  .  ,  =  . . . .  ,  =  .  ,  =  .     () . . ()  

241 where fi(x) (i = 1……, p) are the terms of the model. βi (i = 1……, p) are the coefficients. Error ε is assumed to be uncorrelated and distributed with a of 0 and constant (but unknown) . For model approximation, the vector of constant coefficients, β should be determined. The coefficient β is determined by the least square technique as follows:

β = (XT X)-1 XT y (2)

The constructed second-order polynomial response surface model can be expressed as follows:

    () =  + ∑   + ∑   + ∑   (3)

The model includes—from left to right—an intercept, a linear term, a quadratic , and squared terms.

3.2 Kriging model

The KRG method [27] estimates the value of a response function at some unsampled location as a superposition of two components, namely, a global model and a systematic departure:

y(x) = f(x) +Z(x) (4) where y(x) is an unknown response function to be estimated and f(x) is a polynomial function representing the trend over the design space. The second part Z(x) is the localized deviation that interpolates the sampled data points with a Gaussian correlation with a zero mean and nonzero covariance. The covariance matrix of Z(x) is given as shown below.

cov[z(x),iZ ( x i )]= s 2 R [ R ( x i , x j )], i, j = 1,2....,n s (5) where R is a correlation matrix consisting of a spatial correlation function R(xi, xj) as its elements. σ2 is the process variance i j 2 representing the scalar of the spatial correlation function (SCF), R(x , x ). σ quantifies the correlation between any two ns sampled data points xi and, xj and thereby controls the smoothness of the Kriging model, the effect of nearby points, and the differentiability of the surface. The Gaussian function was used in the present work for SCF since it provides a relatively smooth and infinitely differentiable surface. In the present study, the following Gaussian correlation function was used to obtain the SCF:

ndv i jé i j 2 ù R( x , x )= exp ê -åqk x k - x k ú ëk=1 û (6) where ndv is the number of design variables, and θk is the unknown correlation parameter used to fit the model. The optimum correlation coefficients θk were found by using the maximum likelihood estimation (Martin and Simpson, 2005).

3.3 Radial basis neural networks

The RBNN [28] is a two-layered network that comprises of a hidden layer of radial basis neurons and an output layer of linear neurons. The radial basis functions act as processing units between the input and output. The hidden layer performs a non-linear transformation of the input space to an intermediate space by using a set of radial basis units, called the output layer, and then implements a linear combiner to produce the desired targets. In the present study, the function newrb was used. It is a customized RBNN function and is available in MATLAB [29]. A Gaussian function of the following form was used as the transfer function for a radial basis neuron.   () =  (7)

The linear model f (x) for the function can be expressed as a linear combination of a set of N basis functions:

 () = ∑  , (8) where wj is the weight of the basis function φj. The net input to the transfer function radbas is the vector distance between the weight vector w and the input vector, multiplied by bias b. Bias b allows the adjustment of the sensitivity of the radbas neuron. The prediction ability of the network is stored in the weights, which can be obtained from a set of training patterns. Network training is performed by adjusting the cross-validation (CV) error by changing the spread constant (SC) and error goal (EG).

4. Cross-validation Error

In the present study, the accuracy of the constructed surrogate was evaluated using K-fold CV error. The data sample ((, );  = 1, … . . , N) is partitioned into K disjoint subsets (K-fold CV). K-1 folds are used to train the surrogate model, and the last fold (Kth set) is used for evaluation. This process is repeated K times, leaving one different fold for evaluation each time. The training was performed by adjusting the CV error for the LHS samples. The CV error was presented in terms of root-mean-square error (RMSE) as follows:  ∈= ∑  ( − ) (9)

242

 CV error =  ∑ ∈ (10)    th where ∈ is the prediction error for the K set. The predicted values  were determined using the constructed surrogate models, namely, RSA, KRG, and RBNN models, from the sample points in the (K-1) subsets. For comparing LHS samples with different number of points, the value of K was adjusted such that folds with three points each were generated. For example, the values of K were set to 9, 12, 15, and 18 for LHS samples with 27, 36, 45, and 54 points, respectively, such that three points fall in the validation set in every repeating step.

5. Particle Swarm Optimization Kennedy and Eberhart [30] developed the particle swarm optimization (PSO) algorithm in the mid-1990s. The PSO uses an evolutionary approach that was inspired by the social behavior of flocks of birds and schools of fish to adapt to their environment (Kennedy and Eberhart, 1995). In the PSO, a set of randomly generated solutions (initial swarm) approach towards the optimal solution in the design space based on information about the design space shared by all members of the swarm. PSO starts with an initial population of particles with some initial velocities, all randomly generated. The particle refers to a point in the design space that iteratively changes its location based on the velocity updates. The objective functions are evaluated at each particle location, and the best (minimum) function values and locations are determined. The new velocities are updated based on the current velocities, the particles’ individual optimal locations, and the optimal locations of the particles’ neighbors (using a weighted approach and some parameters) [29]. The velocity update aids in providing a search direction for the next iteration. The location of each particle is updated using the new velocity. These steps are repeated until a desired stopping criterion is reached.

6. Problem Formulation Two types of test problems are considered for to demonstrate the effect of LHS on surrogate modeling and optimization— practical thermal-fluid designs and the approximation of analytical functions.

6.1 Thermal-fluid design problems

Problem 1: The first problem is the optimization of a convergent-divergent micromixer coupled with pulsatile flow; a schematic of such a micromixer, as proposed by Afzal and Kim [15], is shown in Figure 1. Two different fluids entering from different inlets mix in the micromixer. The convergent–divergent walls of the microchannel were generated using a sine function of the form y = A sin (2πx/λ), where A and λ are the amplitude and wavelength, respectively. Time-dependent pulsatile flows of the form V = Vs + Vo sin (2πft + φ) were introduced at the two inlets of the microchannel; these flows consisted of the superposition of steady (Vs) and sinusoidal flows. For the sinusoidal flow, Vo is the amplitude of the velocity, and f is the pulse frequency. Details of the design of the convergent-divergent channel, CFD analysis, and evaluation of the mixing performance are provided in a previous paper [15]. Based on the results, the Strouhal number, St and velocity ratio, and Vo/Vs were selected as the design variables for the optimization. The ranges of the design variables were selected as (0.05–0.60) and (0.5–4.0) for St and Vo/Vs, respectively. In the present work, the objective function FM was formulated as follows:

FM = MIo (11) where MIo is mixing index at the exit of the micromixer.

Fig. 1 Geometry of the convergent-divergent microchannel [15]

243

Fig. 2 Geometry of the cooling channel with ribs: (Top) computational domain, (bottom) cross section of the boot-shaped rib [31] Problem 2: The second problem deals with shape optimization of boot-shaped ribs in a cooling channel [31]. The channel geometry and cross-section of a boot-shaped rib are shown in Figure 2. As discussed in the authors’ previous work (Seo et al., 2015), three design variables, namely, the rib width-to-rib height (⁄ℎ), rib-to-channel height (ℎ⁄), and tip-to-rib height (⁄ℎ) ratios were selected for optimization, which was carried out to maximize the thermal performance of the rib-roughened channel. The range of the design variables, ⁄ℎ, ℎ⁄, and ⁄ℎ, were selected as (0.60–1.40), (0.06–0.14), and (0.10–0.80), respectively. The objective function, i.e., thermal performance, FTP was defined using Nusselt number and friction factor as follows:

  = - ⁄, (12) (⁄) where    = ∫ , (13)   

   =   , (14) ()

. .  = 0.023  , (15)

∆  =  , (16)  

  = 2(2.236 ln  − 4.639) , (17)

Here,  is the area-averaged value of the normalized local Nusselt number;  is the Nusselt number obtained from the Dittus- Boelter correlation [32] for a fully developed turbulent flow in a smooth pipe;  is the wall heat flux;  is the thermal conductivity of the working fluid;  is the wall temperature; and  is the bulk temperature, which is calculated by interpolating temperatures of the inlet and the outlet. The averaged area A is the area of the ribbed surfaces (without ribs). Re is the Reynolds number based on the hydraulic diameter of the channel , and Pr is the Prandtl number.  is the friction factor; ∆ is the pressure drop in the channel; ρ is the fluid density;  is the mean velocity at the inlet; pi is the pitch of the ribs; and  is a friction factor for a fully developed flow in a smooth pipe obtained from the Petukhov empirical correlation [33]. Detailed information on the CFD analysis can be found in the previous work [31].

6.2 Analytical Functions

Based on the literature survey, the following problems using analytical functions were selected for a comparative analysis. The functional forms of the test problems are as follows: 1. Six-hump Camel Back Function [34]   () = 4 − 2.1  +     +   + (4  − 4)   3      (2O￿￿￿￿opt = (0.0898, -0.7126&ڕﮃڕ￿￿&ښښ￿￿￿￿_Ί￿ìí EFڐ￿￿Ü۫ 1 ڎsubject to -5 ≤ xj ≤ 5, j ￿å￿￿ǘ ￿￿ó￿￿ǘ and (-0.0898, 0.7126), with f (xopt) = -1.0316.

2. Branin-Hoo Function [34]  5.1  5 1 () =  −  +  − 6 + 10 1 −  cos( ) + 10  4  8  subject to -5 ≤ x1 ≤ 10 and 0 ≤ x2 ≤ 15. This function has three global minima at xopt = (-3.1416, 12.2750), (9.4248, 2.4750) and (3.1416, 2.2750) with f (xopt) = 0.3978.

244 3. Hartman 3 Function [34]    () = −   exp −   ( −  )    T subject to 0 ≤ xj ≤ 1, j ∈ {1, 2, 3}. a = [1, 1.2, 3, 3.2] , and b and d are given by

3.0 10 30 0.3689 0.1170 0.2673 b = 0.1 10 35  =  0.4699 0.4387 0.7470  3.0 10 30 0.1091 0.8732 0.5547 0.1 10 35 0.03815 0.5743 0.8828

The global minimum is located at xopt = (0.1146, 0.5556, 0.8525) with f (xopt) = -3.8628.

4. Hartman 6 Function [34]    () = −   exp −   ( −  )    T subject to 0 ≤ xj ≤ 1, j ∈ {1,…, 6}. a = [1, 1.2, 3, 3.2] , and b and d are given by

10.0 3.0 17.0 3.5 1.7 8.0 b = 0.05 10.0 17.0 0.1 8.0 14.0  3.0 3.5 1.7 10.0 17.0 8.0 17.0 8.0 0.05 10.0 0.1 14.0

0.1312 0.1696 0.5569 0.0124 0.8283 0.5886 d = 0.2329 0.4135 0.8307 0.3736 0.1004 0.9991 0.2348 0.1451 0.3522 0.2883 0.3047 0.6650 0.4047 0.8828 0.8732 0.5743 0.1091 0.0381

The global minimum is located at xopt = (0.2017, 0.1500, 0.4769, 0.2753, 0.3117, 0.6573) with f (xopt) = -3.3224.

7. Results and Discussion Two case studies on thermal-fluid design problems (Section 6.1) were conducted first to identify the relationship between the accuracy of the surrogate models (RSA, KRG, and RBNN) and LHS sample size. The three LHS samples had 12, 18, and 24 sample points for Problem 1, and 27, 36, and 45 sample points for Problem 2; these points were generated using the MATLAB function lhsdesign. Owing to the large computational time required for the numerical simulations, the effects of sample distribution for a fixed sample size on the accuracy of the surrogate models were not considered in these cases. Performance of a surrogate model was evaluated using CV error for each sample. Figures 3 and 4 show the computational times and CV errors for Problems 1 and 2, respectively. It is natural that the computational time increases with the increase in sample size. For Problem 1, the CV error (an indication of surrogate model accuracy, globally) decreases with the increase in sample size irrespective of the surrogate model, which is quite an obvious trend. However, the improvement in the CV error with the increase in sample size was not generally observed in the case of Problem 2 beyond the sample size of 36, except in the case of RBNN for Problem 2. The CV with RBNN model was found to be the least susceptible to sample size, and it even increased with the increase in the sample size from 36 to 45 in Problem 2. Taking into account the results for CV error, it can be deduced that for Problem 2, the medium sample size of 36 is appropriate for all surrogate models; however, for Problem 1, it is difficult to select the optimum sample size for the models. Generally, one can expect the accuracy of surrogate model to improve with an increase in the number of sample points.

Fig. 3 Effects of LHS sample size on surrogate modeling for Problem 1: (left) Computational time, and (right) CV error

245

Fig. 4 Effects of LHS sample size on surrogate modeling for Problem 2: (left) Computational time, and (right) CV error

(a)

(b)

(c) Fig. 5 Effects of LHS sample size and distribution on (left) CV error of surrogate modeling and (right) optimum objective function value (---: true optimum) for six-hump camel back function: (a) RSA, (b) KRG, and (c) RBNN.

246 However, the results shown in Figures 3 and 4 indicate that there exists an optimum sample size in a certain range of the size, which is quite different from that expected. The chosen problems indicate the diverse nature of surrogate modeling subjected to sample size in a LHS procedure. Therefore, the present analysis can provide guidelines to make a proper choice of the surrogate model and sample size. A more detailed analysis was performed using analytic functions. To compare the performances of the different surrogate models considered in this work, both global exploration and local exploitation properties were analyzed. The analysis comprised the following steps: (1) LHS samples were generated using the MATLAB function lhsdesign. For a fixed number of LHS points, 40 random samples were generated for low-dimensional problems (Booth, six-hump camel back, Branin-Hoo, Hartman 3) and 60 for the high-dimensional problem (Hartman 6) to investigate the effect of spatial distribution of design points in the variable space on surrogate model performance; (2) The surrogates (RSA, KRG, and RBNN) were constructed for each test function, and the CV error was evaluated; (3) The PSO was executed using tentative surrogates to determine the global optimum. Box plots are used to illustrate the results of the present analysis. In the box plots, (1) the central mark is the ; (2) the edges represent the 25th and 75th ; (3) the ends of the vertical lines indicate the minimum and maximum data values, or 1.5 times the inter-quartile range, and (4) the points outside the ends of lines are outliers. The analysis was performed by using an Intel Core i7 processor with eight CPUs and a clock speed of 2.94 GHz on MATLAB platform.

(a)

(b)

(c)

Fig. 6 Effects of LHS sample size and distribution on (left) CV error of surrogate modeling and (right) optimum objective function value (---: true optimum) for Branin -Hoo function: (a) RSA, (b) KRG, and (c) RBNN.

247 The effects of LHS sample size and distribution on the exploration property (accuracy over the domain space) and exploitation (optimization results) of the surrogate models are presented. Figures 5 to 8 show the CV error and optimum objective function values for the six-hump camel back, Branin-Hoo, Hartman 3, and Hartman 6 functions, respectively, with the LHS samples. The PSO algorithm was used to identify the global optimum based on surrogate modeling for each test function. The horizontal dashed line represents the true global optimum determined from the analytical function. The global optimum point obtained using the PSO algorithm was supplied to each test function to determine the objective function value. For a fixed sample size, a is used to present the effect of distribution of design points in the design space on the CV error for each surrogate model and optimum objective function for a fixed sample size. Tables I–IV present the and of the CV errors and objective function values corresponding to box plots for the six-hump camel back, Branin-Hoo, Hartman 3 and Hartman 6 functions (Figures 5 to 8). It can be seen that the three surrogate models, i.e., RSA, KRG, and RBNN models, respond differently to the LHS sampling procedure (number and spatial distribution). The of CV error generally decrease as the sample size increases, except in the case of the high-dimensional Hartman 6 function for which the CV error is relatively insensitive to the

(a)

(b)

(c)

Fig. 7 Effects of LHS sample size and distribution on (left) CV error of surrogate modeling and (right) optimum objective function value (---: true optimum) for Hartman 3 function: (a) RSA, (b) KRG, and (c) RBNN.

248

(a)

(b)

(c)

Fig. 8 Effects of LHS sample size and distribution on (left) CV error of surrogate modeling and (right) optimum objective function value (---: true optimum) for Hartman 6 function: (a) RSA, (b) KRG, and (c) RBNN. number of sample points regardless of the surrogate model (Figure 8). Thus, unlike the previous practical engineering problems, the accuracy of the surrogate model generally improves with an increase in the sample size, but sometimes the improvement is very small, for example as shown in Figure 5 and Figure 8. The RSA model was the least susceptible to the sample size, with relatively small variations in CV error for both low- and high-dimensional problems. The decrease in errors with increase in sample size was the most pronounced for the KRG model. As presented in Tables 1–4, the mean of CV error decreased with an increase in the sample size for the KRG model, regardless of the test function. For a fixed sample size, the RBNN model generally yields the lowest CV error, and thus can be regarded as best model among the tested surrogates. However, in the case of the six-hump camel back function, the error is not very sensitive to the surrogate model. The effect of change in the spatial distribution of sample points on the surrogate accuracy is larger than expected, especially for small sample sizes, but generally it decreases with an increase in sample size, with several exceptions. Thus, the accuracy of surrogate modeling sometimes depends significantly on the spatial distribution of sample points, which is generated differently in each run of the LHS. As can be seen in Figures 5 to 8, a smaller sample size with a good distribution can provide a surrogate with enhance global exploration property (better accuracy)

249

Table 1 Performance analysis of surrogate modeling for different LHS samples for

six-hump camel back function

Surrogate Sample size CV error Objective function Mean Variance Mean Variance RSA 12 50.7665 524.7475 2.1673 103.2507 18 37.4889 102.3171 -0.0458 0.0423 24 34.8614 47.4760 -0.0050 0.0042 30 31.4712 24.3913 -0.0201 0.0030 KRG 12 54.6058 572.2311 0.2702 0.5648 18 38.6814 116.8160 0.4887 0.5968 24 32.5003 44.7406 0.5273 1.0463 30 29.5352 39.1470 0.8082 1.4107 RBNN 12 58.3553 658.6996 29.1216 2733.9950 18 37.4544 100.0325 1.1544 57.1854 24 37.6180 108.5627 7.239435 422.6991 30 23.1370 87.06098 1.4460 1.6325

Table 2 Performance analysis of surrogate modeling for different LHS samples for Branin-Hoo function Surrogate Sample size CV error Objective function

Mean Variance Mean Variance

RSA 12 111.4649 1682.8281 11.25858 40.7709 18 79.1450 783.2718 10.3292 9.1680 24 71.9544 394.0713 9.6809 7.2654 30 61.3294 176.7456 10.3410 3.2202 KRG 12 100.5934 1748.9170 10.1060 5.4009

18 75.0601 559.3313 10.2306 4.7193

24 45.6136 312.047 9.8760 12.9892 30 27.6735 98.8221 8.8518 15.6159 RBNN 12 110.1648 1457.9413 12.9190 98.4195 18 38.6659 274.9212 10.6805 64.7693 24 23.1472 42.7786 8.8919 49.4581

30 18.58791 14.7076 6.8430 43.8516

Table 3 Performance analysis of surrogate modeling for different LHS samples for Hartman 3 function Surrogate Sample size CV error Objective function Mean Variance Mean Variance RSA 27 1.5304 0.1035 -0.7304 0.3871 36 1.4177 0.0539 -1.0214 0.6064 45 1.2709 0.0319 -0.8649 0.4905 54 1.2142 0.0316 -0.7266 0.3047 KRG 27 1.3509 0.0809 -1.6342 2.0386 36 1.0893 0.0610 -2.5515 1.9375 45 0.8014 0.0274 -3.4389 0.7542 54 0.6343 0.0159 -3.5163 0.3571 RBNN 27 1.0145 0.0913 -3.0712 1.2784 36 0.8760 0.0316 -3.2692 1.1473 45 0.8122 0.2420 -3.2441 1.1761 54 0.6386 0.01147 -3.3411 1.1560

250

Table 4 Performance analysis of surrogate modeling for different LHS samples for Hartman 6 function

Surrogate Sample size CV error Objective function

Mean Variance Mean Variance RSA 60 3.0047 2.1316 -0.0474 0.0186 75 2.7408 1.5747 -0.0379 0.0169 90 2.5946 0.9509 -0.0466 0.0191 105 2.2702 0.7796 -0.0076 0.0006

KRG 60 0.9250 0.0686 -0.2814 0.2754 75 0.8216 0.0371 -0.4043 0.4882 90 0.7911 0.0229 -0.3685 0.4041 105 0.7008 0.0152 -0.5765 0.7377 RBNN 60 0.6219 0.0269 -2.1534 0.2879 75 0.6730 0.0108 -2.0453 0.1463

90 0.5466 0.0167 -2.3196 0.2274 105 0.6135 0.0127 -2.3673 0.1742 compared with a larger sample with poor distribution. It can be inferred that the different sample distributions can lead to unexpected errors. However, the effect of distribution of sample points on CV error was considerably less for large sample sizes. The surrogate models could find the approximate optimal points, which show deviations from the true optimum for different sample sizes and distributions (Figures 5 to 8). Among the tested surrogates, the RSA model showed the highest overall deviations of the optima from its true values for the tested functions, except for the six-hump camel back function. These deviations of the optimum values for the RSA were reduced the least with the increase in the sample size. Thus, RSA model appears to be most inaccurate surrogate model for finding the global optimum points for the tested functions, except for the six-hump camel back function for which the optimum point was best predicted by the RSA model. On the other hand, the RBNN model showed best overall approach to the true optimum for Branin-Hoo, Hartman 3, and Hartman 6 functions. Overall, the local exploitation property (tendency to find the true optimum) is less dependent on the sample size but more affected by the spatial distribution of the sample points (Figures 5–8) than expected, regardless of the surrogate model. It is also sensitive to the characteristics of the problem under consideration. Therefore, the local exploitation property of a surrogate model with a chosen initial sample size is expected to be enhanced using an “adaptive sampling” strategy, which involves updating the surrogate model by adding sample points based on the local information of the current global optimum at each iteration [18-19].

8. Conclusion In the present work, a comparative analysis of parametric (RSA and KRG) and non-parametric (RBNN) surrogate models was performed for numerical test problems to study the effects of initial sample size and spatial distribution of the LHS sampling plan on the performance of the surrogate models. The issues related to the surrogate approximation, LHS procedure, and computational burden were demonstrated through a case study of four analytical functions and two different practical applications in the field of thermo-fluids: the optimizations of a convergent-divergent micromixer coupled with pulsatile flow (Problem 1) and boot-shaped ribs in a cooling channel (Problem 2). The surrogate models were analyzed in terms of exploration (accuracy over the domain space) and exploitation (finding the global optimum) characteristics with sample size and distribution. To determine the global optimum, the PSO algorithm was used. The analysis of two practical thermo-fluids problems revealed several interesting findings. For Problem 1, the CV error decreases as the sample size increases irrespective of the surrogate model. For RSA and KRG models, a similar characteristic was observed for Problem 2 as the sample size increased from 27 to 36. However, the improvement in the CV error with the increase in sample size was not generally observed beyond the sample size of 36, except in the case of RBNN for Problem 2. The CV with RBNN model was found to be least susceptible to sample size, and it even increased with the increase in the sample size from 36 to 45 in Problem 2. For Problem 2, the worsening of the CV error can be attributed to the fact that the distribution of design points for the sample size 45 was not favourable for the RBNN model. This indicates that the accuracies of the surrogate models did not increase with an increase in the sample size, which is quite different from that generally expected. On the other hand, in the case of the analytical functions, the medians of CV errors generally decreased as the sample size increased. For the selected test functions, the KRG model showed the most pronounced reduction in the errors with an increase in sample size. For a fixed sample size, the RBNN model generally yielded the lowest CV errors for the tested analytical functions. Thus, testing different surrogates can be a good exercise to reduce the error of a surrogate model without increasing the number of sample points. The accuracy of surrogate modeling depended significantly on the spatial distribution of sample points, which were generated differently in each run of LHS. It was found that lager sample sizes are less susceptible to the distribution effect in the LHS sampling procedure. With an acceptable computation cost, a higher sample size will lead to the error dependence on distribution of points to a narrow spectrum. With regard to the optimization results of analytical functions, the RSA model appears to be the most inaccurate surrogate in finding the global optimum points for the tested functions, except for the six-hump camel back function. On the other hand, the RBNN model showed the best overall approach to the true optimum for Branin-Hoo, Hartman 3, and Hartman 6 functions. Therefore, it can be deduced that the tendency to find the true optimum (local exploitation property) is less dependent on the sample size, but

251 more affected by the spatial distribution of the sample points than generally expected. The research demonstrates a clear perspective to practitioners and designers with regard to the application of effective surrogate-modeling-based optimization with respect to the LHS procedure and the choice of the appropriate surrogate model. Based on the above results, it is recommended to use an initial sample size equal to 15 times the number of design variables, irrespective of the choice of surrogate model under the condition that the number of design variables is not high (up to 10). A further improvement in exploration/exploitation characteristics can be assisted using ‘adaptive sampling’ strategies available in open literature.

Nomenclature

A Amplitude of sinusoidal wall [m] Tb Bulk temperature [K] Pr Prandtl number Tw Wall temperature [K] pi Pitch of the ribs [m] Vo Amplitude of pulsed sinusoidal flow [m/s] 2 qo Wall Heat-flux [W/m ] Vs Steady-state velocity of pulsatile flow [m/s] Re Reynolds number Δp Pressure loss [Pa] St Strouhal number k Thermal conductivity [W/(m.K)] t Time [s] λ Wavelength of sinusoidal wall [m] T Local temperature [K] ρ Density of fluid [kg/m3]

References [1] Kim, H.M. and Kim, K. Y., 2006, “Shape optimization of three-dimensional channel roughened by angled ribs with RANS analysis of turbulent heat transfer,” International Journal of Heat and Mass Transfer, Vol. 49, pp. 4013–4022. [2] Knill D.L., Giunta A.A., Baker C.A., Grossman B., Mason W.H., Haftka R.T., Watson L.T., 1999, “Response surface models combining linear and euler aerodynamics for supersonic transport design,” Journal of Aircraft, Vol. 36 No. 1, pp. 75–86. [3] Lee, S.Y. and Kim, K. Y., 2000, “Design optimization of axial flow compressor blades with three-dimensional Navier-Stokes solver,” KSME International Journal, Vol. 14 No. 9, 1005–1012. [4] Bahrami, S., Tribes, C., Fellenberg, S.V., Vu, T.C., Guibault F., 2015, “Physics-based Surrogate Optimization of Francis Turbine Runner Blades, Using Mesh Adaptive Direct Search and Evolutionary Algorithms,” International Journal of Fluid Machinery and Systems, Vol. 8 No. 3, 209-219. [5] Samad, A., Kim, K. Y., Goel, T., Haftka, R.T. and Shyy, W., 2008, “Multiple surrogate modeling for axial compressor blade shape optimization,” Journal of Propulsion and power, 24, 302-310. [6] Samad, A., and Kim, K. Y., 2009, “Surrogate Based Optimization Techniques for Aerodynamic Design of Turbomachinery,” International Journal of Fluid Machinery and Systems, Vol. 2 No. 2, pp. 179-188. [7] Benini, E., 2004, “Three-Dimensional Multi-Objective Design Optimization of a Transonic Compressor Rotor,” Journal of Propulsion and power, Vol. 20 No. 3, pp. 559-565. [8] Madsen, J.I., Shyy, W. and Haftka, R.T, 2000, “Response surface techniques for diffuser shape optimization,” AIAA J, Vol. 38, pp. 1512–1518. [9] Queipo, N.V., Haftka, R.T., Shyy, W., Goel, T., Vaidyanathan, R. and Tucker, P.K., 2005, “Surrogate-based analysis and optimization,” Progress in Aerospace sciences, Vol. 41, pp. 1-28. [10] Forrester, A.I.J. and Keane, A.J., 2009, “Recent advances in surrogate-based optimization,” Progress in Aerospace sciences, Vol. 45, pp. 50-79. [11] Barthelemy, J.-FM. and Haftka, R.T, 1993, “Approximation concepts for optimum structural design—a review,” Structural Optimization, Vol. 5, pp. 129–144. [12] Afzal, A. and Kim, K.-Y., 2015, “Multi-objective Optimization of a Passive Micromixer based on periodic variation of Velocity profile,” Chemical Engineering Communications, Vol. 202, pp. 322-333. [13] Afzal, A. and Kim, K.-Y., 2015, “Multi-objective Optimization of a Micromixer with Convergent-divergent sinusoidal walls,” Chemical Engineering Communications, Vol. 202, pp. 1324-1334. [14] Afzal. A. and Kim, K.-Y., 2015, “Optimization of pulsatile flow and geometry for a Convergent-divergent micromixer,” Chemical Engineering Technology., Vol. 281, pp. 134-143. [15] Afzal, A. and Kim, K. Y., 2015, “Convergent-divergent micromixer coupled with Pulsatile flow,” Sens Actuators B: Chemical, 211, 198-205. [16] Jin, R., Chen, W. and Simpson, T.W., 2001, “Comparative studies of metamodeling techniques under multiple modeling criteria,” Structural and Multidisciplinary Optimization, Vol. 23 No. 1, pp. 1–13. [17] Simpson, T.W., Peplinski, J.D., Koch, P.N. and Allen, J.K., 2001, “Meta-models for computer based engineering design: survey and recommendations,” Engineering Computations, Vol. 17, pp. 129–150. [18] Regis, R.G., 2016, “Trust regions in Kriging-based optimization with expected improvement,” Engineering optimization, Vol. 48 No. 6, pp. 1037–1059. [19] Wang, H., Ye, F., Li, E. and Li, G., 2016, “A comparative study of expected improvement-assisted global optimization with different surrogates,” Engineering optimization, Vol. 48 No. 8, pp. 1432–1458. [20] Jones, D.R., Schonlau, M. and Welch, W.J., 1998, “Efficient global optimization of expensive black-box functions,” J. Global Optimization, Vol. 13, pp. 455–492. [21] Yang, X., Liu, Y., Gao, Y., Zhang, Y., and Gao, Z., 2015, “An active learning kriging model for hybrid reliability analysis with both random and interval variables,” Structural and Multidisciplinary Optimization, Vol. 51, pp. 1003–1016.

252 [22] Razavi, S., Tolson, B.A. and Burn, D.H., 2012, “Review of surrogate modeling in water resources,” Water Resources Research, Vol. 48 No. 7, pp. 1-32. [23] Mehmani, A., Zhang, J., Chowdhury, S. and Messac, A., 2012, “Surrogate-based Design Optimization with Adaptive sequential sampling,” 53rd AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and Materials Conference, Honolulu, Hawaii. [24] McKay, M., Conover, W. and Beckman, R., 1979, “A comparison of three methods for selecting values of input variables in the analysis of output from a computer code,” Technometrics, Vol. 21, pp. 239–45. [25] Stein, M., 1987, “Large sample properties of simulations using Latin hypercube sampling,” Technometrics, 29, 143–51. [26] Myers, R.H. and Montgomery, D.C., 1995, Response Surface Methodology: Process and Product Optimization Using Designed . New York, Wiley [27] Martin, J.D. and Simpson, T.W., 2005, “Use of Kriging models to approximate deterministic computer models,” AIAA J., Vol. 43 No. 4, 853-863. [28] Chen, C., Cowan, C. F. N. and Grant, P. M., 1991, “Orthogonal least squares learning algorithm for radial basis function networks,” IEEE transaction of neural networks, Vol. 2, pp. 302-309. [29] MATLAB version 8.4.0, 2014, The Language of Technical Computing, Massachusetts: The MathWorks, Inc. [30] Kennedy, J. and Eberhart, R.C., 1995, “Particle Swarm Optimization,” Proceedings of the IEEE International Conference on Neural Networks, 27 November-1 December, Perth, Australia, 1942-1948. [31] Seo, J.W., Afzal, A. and Kim, K. Y., 2015, “Optimization of a boot-shaped rib in a cooling channel,” The 13th Asian International Conference on Fluid Machinery, Tokyo, Japan, September 7-10. [32] Dittus, F.W. and Boelter, L.M., 1930, “Heat Transfer in Automobile Radiators of the Turbulator Type,” University of California, Berkeley Publication, Vol. 2, pp. 443-461. [33] Petukhov, B.S., 1970. Advances in Heat Transfer. New York: Academic Press. [34] Dixon, L.C.W. and Szegö, G.P., 1978, “Towards global optimization 2,” Amsterdam, The Netherlands: North Holland.

253