Felipe Carraro
A stochastic Kriging approach for the minimization of integrals
Florianópolis 2017
Felipe Carraro
A stochastic Kriging approach for the minimization of integrals
Dissertation submitted to the Civil Engineering Department of the Fed- eral University of Santa Catarina as a partial requirement to obtain the Master’s degree.
Federal University of Santa Catarina Department of Civil Engineering Graduation Program in Civil Engineering
Advisor: Prof. Rafael Holdorf Lopez Co-advisor: Prof. Leandro Fleck Fadel Miguel
Florianópolis 2017 Ficha de identificação da obra elaborada pelo autor, através do Programa de Geração Automática da Biblioteca Universitária da UFSC.
Carraro, Felipe A stochastic Kriging approach for the minimization of integrals / Felipe Carraro ; orientador, Rafael Holdorf Lopez, coorientador, Leandro Fleck Fadel Miguel, 2017. 157 p.
Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós Graduação em Engenharia Civil, Florianópolis, 2017.
Inclui referências.
1. Engenharia Civil. 2. minimização de integrais. 3. otimização global. 4. metamodelos. 5. kriging. I. Lopez, Rafael Holdorf. II. Miguel, Leandro Fleck Fadel. III. Universidade Federal de Santa Catarina. Programa de Pós-Graduação em Engenharia Civil. IV. Título. Felipe Carraro
A stochastic Kriging approach for the minimization of integrals
Esta Dissertação foi julgada adequada para a obtenção do Título de MESTRE em Engenharia Civil e aprovada em sua forma final pelo Programa de Pós-Graduação em Engenharia Civil - PPGEC da Universidade Federal de Santa Catarina.
Florianópolis, 05 de outubro de 2017:
Glicério Trichês, Dr. Coordenador PPGEC
Comissão Examinadora:
Prof. Rafael Holdorf Lopez, Dr. (Orientador) Universidade Federal de Santa Catarina
Prof. Breno Pinheiro Jacob, Dr. (Videoconferência) Universidade Federal do Rio de Janeiro
Prof. Marcelo Krajnc Alves, Ph.D. Universidade Federal de Santa Catarina
Prof. Wellison José de Santana Gomes, Dr. Universidade Federal de Santa Catarina
Resumo
Este estudo tem como objetivo propor um método eficiente baseado na metodologia Stochastic Kriging (SK) para a solução de problemas de minimização de integrais. O modelo SK é usado para criar uma aproxi- mação rápida para a função a ser minimizada. Além disso, estimativas de variância da simulação de Monte Carlo são usadas para auxiliar a otimização. O procedimento de otimização aplica o algoritmo Efficient Global Optimization com o critério de preenchimento Augmented Ex- pected Improvement. Observa-se que o alvo de variância influencia os resultados da otimização. Um alvo muito baixo faz com a otimização torne-se demasiadamente custosa, enquanto que um alvo muito alto pode estagnar a otimização. Desta maneira, uma seleção adaptativa do alvo de variância é proposta. De modo a verificar o desempenho do método proposto vários testes comparativos são conduzidos. O método é aplicado para diversas funções de referência da literatura submetidas a um ruído estocástico. Além disso, comparações são realizadas em relação a um algoritmo eficiente conhecido e ao uso de quadraturas para a avaliação da integral. Por fim, a abordagem proposta é também apli- cada a um problema de engenharia estrutural. Os resultados destacam o desempenho eficiente do método bem como a sua consistência ao longo de diversas execuções independentes.
Palavras-chave: stochastic kriging. otimização global. minimização de integrais.
Resumo expandido
Introdução
Atualmente, muitas áreas da engenharia como civil, mecânica, naval e aeroespacial usam técnicas de otimização como uma ferramenta para resolver problemas reais. Apesar do aumento da capacidade computa- cional disponível, os ambientes de engenharia continuam a evoluir a analisar sistemas mais complexos, que podem incluir mais detalhes, por exemplo, modelos mais refinados. A análise desse tipo de problema, em termos de otimização, pode tornar-se inviável uma vez que uma única análise do modelo pode ser bastante custosa computacionalmente. Pode ser necessário a avaliação das incertezas inerentes ao problema estudado. Neste caso, há ainda um custo adicional relacionado ao fato do modelo analisado ser estocástico.
Objetivos
Este estudo tem como objetivo propor um método eficiente baseado na metodologia Stochastic Kriging (SK) para a solução de problemas de minimização de integrais. Mais especificamente busca-se estudar técnicas de metamodelagem associadas a simulações de Monte Carlo. A partir destas técnicas, objetiva-se a implementação de algoritmos computacionais bem como a realização de testes para a verificação do desempenho obtido. Uma das metas deste estudo também é propor uma maneira de incluir o erro cometido pela simulação de Monte Carlo na construção do metamodelo. Mais ainda, aplicar o método proposto a um problema de engenharia, como por exemplo o de controle ótimo da resposta dinâmica de estruturas sujeitas a carregamentos transientes. Por fim, busca-se verificar a eficiência dos métodos estudados. Metodologia
O modelo SK é usado para criar uma aproximação rápida para a função a ser minimizada. Além disso, estimativas de variância da simulação de Monte Carlo são usadas para auxiliar a otimização. O procedimento de otimização aplica o algoritmo Efficient Global Optimization com o critério de preenchimento Augmented Expected Improvement. No entanto, este critério é modificado para que seja possível a inserção de pontos no modelo utilizando-se alvos de variância diversos. Mais ainda, de forma a balancear as características de exploração e refinamento da resposta, o alvo é alterado pelo uso de uma função de decaimento exponencial. Esta função realiza a adaptação do alvo conforme as características do ponto que deve ser adicionado ao metamodelo. De modo a verificar o desempenho do método proposto, vários testes comparativos são conduzidos. Inicialmente, o método é aplicado para quatro funções de referência da literatura submetidas a um ruído esto- cástico. As funções possuem 1, 2, 6 e 10 dimensões. O ruído estocástico é aplicado multiplicativamente a função determinística por meio de uma variável aleatória normal com média unitária e desvio padrão arbitrado para cada problema. Além disso, comparações são realizadas entre o método proposto e um al- goritmo eficiente conhecido, chamado Globalized Bounded Nelder–Mead (GBNM). Ainda, verifica-se existe vantagem em realizar a avaliação da integral que descreve o problema de otimização utilizando-se quadraturas ao invés da técnica de simulação de Monte Carlo. Por fim, a abordagem proposta é também aplicada a um problema de engenharia estrutural. Neste problema o objetivo é minimizar o valor esperado da probabilidade de falha de um pórtico plano de 10 pavimentos sujeito a ação sísmica. Uma forma de aleatoriedade do problema advém do sismo, que é simulado utilizando-se um processo estocástico filtrado pelo espectro Kanai-Tajimi. A outra, vem incorporada nos parâmetros estruturais de massa, amortecimento e rigidez. Estes são descritos por variáveis aleatórias com uma média fixa e um certo coeficiente de variação. A solução do problema se dá com uso do método proposto sendo que a parte dinâmica do problema é solucionada a partir da equação de Lyapunov. Esta equação descreve o problema em sua formulação de espaço de estado.
Resultados e discussão
Observa-se que o alvo de variância influencia os resultados da otimização. Um alvo muito baixo faz com a otimização torne-se demasiadamente custosa, enquanto que um alvo muito alto pode estagnar a otimização. É esta situação que motiva o uso de um esquema adaptativo para a seleção do alvo de variância. A partir dos resultados da otimização dos problemas referência da literatura, observam-se reduções no custo computacional e na variabilidade dos resultados quando da utilização do alvo adaptativo em relação ao constante. O uso de quadraturas, em sua forma mais simples com a utilização do produto de regras unidimensionais, acresce demasiadamente o número de avaliações para se chegar a uma resposta similar ao obtido pelo método proposto. Com relação ao algoritmo GBNM utilizado para comparação, o método proposto se mostrou mais eficiente e mesmo alterando-se os parâmetros internos daquele algoritimo não foi possível obter um número de avaliações equivalente para a partir do problema de duas dimensões. Os resultados destacam o desempenho eficiente do método bem como a sua consistência ao longo de diversas execuções independentes.
Considerações finais
De maneira geral pode-se dizer que os objetivos do estudo foram atingi- dos. Conseguiu-se propor um método que incorpora o erro de Monte Carlo na criação do metamodelo. Além disso, utilizando-se o algoritmo EGO foi possível realizar o refino iterativo do metamodelo objetivando a localização do ótimo global. Questões como a parada inesperada do algoritmo ou o alto número de avalições que estavam relacionadas ao valor do alvo de variância foram solucionados por meio do esquema adaptativo. O método proposto obteve uma maior eficiência com relação ao número de chamadas da função objetivo além de consistentemente chegar mais próximo ao ótimo global.
Palavras-chave: stochastic kriging. otimização global. minimização de integrais. Abstract
This study aims at proposing an efficient method based on the Stochastic Kriging (SK) methodology for the solution of integral minimization problems. The SK metamodel is used to create a fast approximation for the function being minimized. Moreover, variance estimates from Monte Carlo simulation are used to aid the optimization. The minimization procedure employed the Efficient Global Optimization algorithm with the Augmented Expected Improvement infill criterion. It can be observed that the target variance influences the optimization results. Setting it too low causes the optimization to becomes too costly, while setting it too high might stall the optimization. Therefore, an adaptive target setting is proposed. In order to verify the performance of the proposed method multiple benchmarks are conducted. The method is applied to a number of noisy benchmark functions from literature. Moreover, comparisons are made against a known efficient optimization algorithm as well as an integral evaluation approach using quadratures. At last, the proposed approach is also applied to a structural engineering problem. The results highlight the efficient performance of the method as well as its consistency over multiple independent runs.
Keywords: stochastic kriging. global optimization. integral minimiza- tion.
Contents
1 INTRODUCTION ...... 21 1.1 Objectives ...... 26 1.1.1 General Objective ...... 26 1.1.2 Specific Objectives ...... 26 1.2 Dissertation structure ...... 26
2 OPTIMIZATION...... 29 2.1 Formulation of an optimization problem ...... 30 2.2 Aspects and classification of optimization problems 32 2.2.1 Convexity ...... 32 2.2.2 Local or global optima ...... 34 2.3 Aspects and classification of optimization algorithms 35 2.3.1 Algorithmic progression ...... 35 2.3.2 Local Search ...... 36 2.3.3 Global optimization ...... 37
3 PROBABILITY THEORY AND UNCERTAINTIES . . . . 39 3.1 Initial probability concepts ...... 39 3.1.1 Events ...... 39 3.1.2 Axiomatic probability measure ...... 39 3.1.3 Frequentist interpretation of probability ...... 40 3.1.4 Random variables ...... 41 3.2 Density Functions ...... 41 3.2.1 Cumulative Density Function ...... 41 3.2.2 Probability Density Function ...... 42 3.3 Moments ...... 43 3.3.1 Mean ...... 43 3.3.2 Variance ...... 43 3.3.3 Covariance and Correlation ...... 44 3.4 Gaussian Distribution ...... 45 3.5 Extension to multiple variables ...... 46 3.5.1 PDF and CDF ...... 46 3.5.2 Expected value of arbitrary functions ...... 47 3.5.3 Correlation and Covariance ...... 48 3.6 Stochastic Processes ...... 49 3.7 Structural Reliability ...... 50 3.8 Types of uncertainties ...... 52 3.9 Uncertainty representation and quantification ... 53 3.10 Formulation of designs in an uncertain context .. 54 3.10.1 Robust Design Optimization - RDO ...... 55
4 DETERMINISTIC METAMODELING...... 59 4.1 Metamodel approaches ...... 60 4.2 Kriging ...... 61 4.2.1 Formulation ...... 63 4.3 Sampling plan ...... 68 4.4 Efficient Global Optimization ...... 70 4.5 Probability of Improvement - PI ...... 72 4.6 Enhanced Method 4 - EM4 ...... 73
5 STOCHASTIC METAMODELING...... 77 5.1 Kriging for noisy data ...... 77 5.2 Kriging exploiting Monte Carlo variance ...... 78 5.3 Stochastic Kriging - SK ...... 81 5.4 Infill criteria for noisy evaluations ...... 85 5.4.1 Augmented Expected Improvement - AEI ...... 86 5.4.2 Example ...... 90 5.5 Adaptive target ...... 94
6 NUMERICAL RESULTS ...... 103 6.1 EGO performance on stochastic benchmark func- tions ...... 104 6.1.1 Branin tilted ...... 106 6.1.2 Hartman 6D ...... 107 6.1.3 Levy 10D ...... 109 6.2 Comparative against another approach ...... 112 6.3 Advantage against quadratures ...... 114 6.4 Application: Tuned-Mass Dumper system optimiza- tion ...... 118
7 CONCLUSIONS AND FUTURE STUDIES ...... 127 7.1 Conclusions ...... 127 7.2 Future studies ...... 129
REFERENCES ...... 131
APPENDIX A – TIME-DEPENDENT RELIABILITY OF OSCILLATORS ...... 147
APPENDIX B – SOLVING TMD RANDOM RESPONSE BASED ON LYAPUNOV EQUATION . 151
APPENDIX C – INTRINSIC NOISE ASSUMPTION . 155
17 List of Figures
Figure 1 – FEA of a structural collapse simulation ...... 21 Figure 2 – International Space Station Risk of Impact from Or- bital Debris...... 22 Figure 3 – Structural failure...... 24 Figure 4 – Example of Kriging metamodel prediction ...... 25 Figure 5 – Example of a simple plane frame...... 29 Figure 6 – Convexity check on unidimensional function . . . . . 33 Figure 7 – Local and global minima...... 34 Figure 8 – Illustration of a PDF...... 42 Figure 9 – PDF and CDF from standard normal distribution . 46 Figure 10 – Realizations from a stochastic process ...... 50 Figure 11 – Limit state representation...... 51 Figure 12 – Analytical methods...... 54 Figure 13 – Multiquadric base function ...... 63 Figure 14 – Influence of p on correlation...... 65 Figure 15 – Influence of θ on correlation...... 66 Figure 16 – Latin Square ...... 69 Figure 17 – Illustration of designs with different space-fillingness 70 Figure 18 – A graphical interpretation of the probability of im- provement...... 73 Figure 19 – Illustration of clustering over multiple targets . . . . 75 Figure 20 – Function plot - Problem 1...... 78 Figure 21 – Comparison of multiple target variances ...... 80 Figure 22 – Infill criterion on one-dimensional noisy function . . 89 Figure 23 – Infill criterion on two-dimensional noisy function . . 90 Figure 24 – Noisy function ...... 91 Figure 25 – Different model based on the estimated error . . . . 92 Figure 26 – AEI plot over the domain...... 93 Figure 27 – Refinement of the model seeking the optimal value . 95 18 List of Figures
Figure 28 – Fluxogram of the algorithm...... 97 Figure 29 – Stalling at infill 20 with target variance 1.00 . . . . . 98 Figure 30 – Target decay varying with problem dimension . . . . 99 Figure 31 – 1D Function - Limited number of evaluations . . . . 101 Figure 32 – Function plot - Branin tilted ...... 107 Figure 33 – 2D Function - Limited number of evaluations . . . . 108 Figure 34 – 6D Function - Limited number of evaluations . . . . 110 Figure 35 – Levy function for the 2-dimensional case ...... 111 Figure 36 – 10D Function - Limited number of evaluations . . . 112 Figure 37 – Mean convergence for 1D noisy function ...... 113 Figure 38 – Comparison between two approaches with 1D function114 Figure 39 – 1D Quadrature...... 116 Figure 40 – 2D Quadrature...... 117 Figure 41 – TDM building ...... 119 Figure 42 – Noisy reliability surface over design variables range . 122 Figure 43 – Monte Carlo convergence curve ...... 125 Figure 44 – Surface generated by Stochastic Kriging with sampled points...... 126 Figure 45 – Barrier ...... 147 Figure 46 – Histograms of the intrinsic errors ...... 157 19 List of Tables
Table 1 – Different uncertain formulations...... 55 Table 2 – Selection methods...... 71 Table 3 – Statistical properties of structural parameters . . . . 123 Table 4 – TMD optimization results ...... 124
21 1 Introduction
Optimization can be seen as the search for the best outcome of a given process while satisfying certain restrictions. In engineering, its use for design and planning is extensive since long back. However, with the increasingly availability of computing power it has gained significant popularity. In recent times, many engineering fields such as civil, mechanical, naval and aerospace use it as a tool to solve real-life problems. Although the increase in available computing power, the current engineering environment continues to evolve and the analysis of complex systems may now include more detail, i.e., model refinements. When dealing with large or complex models such as Finite Element Analysis (FEA), illustrated in Figure 1, or Computational Fluid Dynamics (CFD), each analysis can become computationally costly.
Figure 1 – FEA of a structural collapse simulation
Source – Applied Science International, LLC(2016)
Thus, refining the mathematical/mechanical model leads to a more expensive problem to optimize. However, this is not the only 22 Chapter 1. Introduction source of added complexity. For a long time in engineering, researchers have focused on improving structural models. One could analyze, for example, a beam considering a higher order beam model, plasticity, damage theories, and other sources of non- linearity, and approximate the solution using a state-of-the-art finite element model. All of this procedure would still be a rough representation of reality if the intrinsic randomness of materials (rock, soil, concrete) and loads (wind, earth- quake motion) were disregarded and a deterministic average was used (SUDRET; KIUREGHIAN, 2000).
Impact Risk
Figure 2 – International Space Station Risk of Impact from Orbital Debris
Source – NASA(2014b)
Accounting for the randomness present in the model makes it possible to achieve a reliable design. For example, Figure 2 shows a model of the International Space Station (ISS) with the associated risk of impact by orbital debris. It cannot be known for certain how particles free in space will behave, but by using probabilistic models it is possible to determine which parts of the spacecraft are more vulnerable to this source of impact. With this information designers may use more 23 protective shielding in those regions with larger risk and reduce the cost of the structure with less protection in non-critical or improbable to be damaged parts. This type of optimization aims to effectively reduce the probability of failure of the structure and also generate economical savings. This is especially important considering a high investment like the ISS, where estimated construction and operation costs reach almost 75 billion dollars (NASA, 2014a). Orbital debris may seem a bit distant from an engineer’s daily life but the same concept may be applied for other sources of unpredictable loads such as: wind, earthquakes, action of waves, ice, etc. In Figure 3 two examples of failure events associated with natural uncertain phenomena are shown. Thus, cost and reliability are two important features in any project and the trade-off between them should be considered. When trying to optimize large or complex systems including also the system’s reliability, the amount of computational resources required may render the task intractable. An efficient optimization algorithm must be able to find the best or at least a reasonably good result under limited time or computational budget. Classic optimization methods make use of gradients to try to find extreme values of an objective function. However, considering the FEA example, there is usually no explicit function to be used, as the output comes from a computer simulation. Thus, obtaining the derivatives may be either impossible or very costly to approximate. Nevertheless, derivative-free algorithms exist, e.g. the Nelder-Mead (NELDER; MEAD, 1965; NHAMAGE et al., 2014). Even so, both types of algorithms may not be successful in finding the global optimum of the usually highly non-linear, multimodal and non-convex problems that arise in engineering. As a result of such algorithms becoming trapped at local optimum, the outcome becomes dependent on the starting point. Metaheuristics are another class of algorithms, widely used in en- gineering optimization. They can be used to solve black-box functions as they make very few assumptions about function characteristics and usu- ally develop their search procedure based solely on evaluated responses. 24 Chapter 1. Introduction
(a) Wind turbine collapse failure under extreme wind
(b) Building collapse under earthquake
Figure 3 – Structural failure
Source – Brome(2010) and Takats(2005)
On the other hand, their usefulness is diminished when considering expensive functions. Usually a large number of function calls are needed in order to obtain convergence, leading to a prohibitive execution time (HUANG et al., 2006). Beyond both of these approaches, there is an alternative method that can cope with the limitations discussed: metamodels. Metamodels are simpler approximate models, created by adjusting a response surface based on a sample of simulated points from the original model. By being simple, the metamodel can be used as a replacement of the original expensive function, reducing the computational burden of performing numerous simulations. A widely applied metamodel to deal with the type of problem discussed above is called Kriging. A prediction example 25 using Kriging is shown in Figure 4.
10 True function 8 95% Confidence Interval 6 Kriging estimate Samples 4
2
0
2 − 4 − 6 − 8 − 0 1 2 3 4 5 6 7 8 9 10
Figure 4 – Example of Kriging metamodel prediction
Kriging in its usual formulation considers the approximation of deterministic functions. But what if the objective function possesses randomness? This randomness could result from, for example, uncer- tainty in the input parameters or noise on the function response. In such cases, a more recent extension called Stochastic Kriging may be employed. Under this approach, the minimization problem is written as the minimization of an integral. This integral form defines a class of problems which are the focus of this study. This form arises naturally in some engineering cases. Yet, some problems may be rewritten with this integral form. Stochastic Kriging formulation, its characteristics and how it can be employed in order to solve the class of problems discussed, are addressed in this study. 26 Chapter 1. Introduction
1.1 Objectives
1.1.1 General Objective
Develop a method based on Stochastic Kriging for the minimiza- tion of expensive to evaluate functions that depend on an integral.
1.1.2 Specific Objectives
• Study the use of kriging metamodeling techniques coupled with Monte Carlo simulation;
• Implement computational algorithms and test solution approaches found in literature;
• Propose a way to include the error committed with Monte Carlo simulation in the metamodel construction;
• Apply the metamodeling framework to a practical engineering problem, e.g., optimal control of dynamic response under transient loads;
• Verify the overall efficiency of the studied methods.
1.2 Dissertation structure
This dissertation is structured in chapters. The initial chapters aim to constructively gather the knowledge for the understanding and development of the proposed method as well as its application. With the theoretical basis well established, the final chapters present numeri- cal experiments with the proposed method, results are discussed and conclusion are drawn. A brief overview of each chapter is presented below. In Chapter2 the theory and formulation of general optimization problems are presented. In addition, problem categories and optimization algorithms are also discussed. 1.2. Dissertation structure 27
Chapter3 concisely describes the probabilistic background needed for the characterization of random parameters. Moreover, it details how uncertainties are considered in optimization as well as the formulation of problems involving random variables. Chapter4 explains the general metamodeling concept and gives examples of existent methods. It then, focuses in discussing the approach used in this study: Kriging. The formulation of the Kriging predictor and the process of parameter estimation is presented. The coupling of optimization and metamodeling is reviewed under the Efficient Global Optimization method. In Chapter5 resides the main contribution of this study. It intro- duces the concepts for dealing with stochastic problems. Moreover, it employs the Stochastic Kriging framework with proposed modifications to solve a class of integral minimization problems. In Chapter6 a few benchmark problems are solved in order to show the metamodeling technique efficiency as well as to demonstrate the proper functioning of the implemented algorithms. In Chapter7 conclusions are drawn in respect of the results obtained in the benchmarks as well as the overall performance of the proposed method. Additionally, it presents suggestions regarding further studies related to the studied topic.
29 2 Optimization
Optimization might be defined as the science of determining the “best” solutions to certain mathematically defined problems, which are often models of physical reality (FLETCHER, 1987). Its practical use begins with the definition of at least one objective function, which represents a measure of performance. This function depends on certain characteristics of the system, called design variables. Thus, the goal of optimization is to find the design variables that return the best value for the objective function. In an engineering context, an example could be that of optimizing the sections of a planar steel frame (CARRARO et al., 2016). A simple example of a frame that could be optimized is shown in Figure 5.
2 Q2, E2 1
Q , E , E c 1 1 A 3 3 A A b
Figure 5 – Example of a simple plane frame
Three elements characterize an optimization problem:
1. Objective function: this function associates the system parameters and measures a certain performance value. In the steel frame example, an objective function could be the total volume or weight of the structure or the monetary cost to build it. In Figure 5, the objective could be to find A1, A2 and A3 which minimize the 30 Chapter 2. Optimization
volume of the structure. Thus, the objective function would be written as fobj = Vol = A1c + A2b + A3c, where b and c are the lengths of beam and columns, respectively.
2. Design variables: these are the input parameters which modify the system response and can be selected in order to improve the objective function value. In the example, the design variables could be the steel profiles associated with groups of beams or columns. These profiles would be chosen from a catalog, i.e. from a list of commercially available profiles. When design variables have to be chosen based on discrete set, the problem is classified as a Discrete Optimization (PAPADIMITRIOU; STEIGLITZ, 1982; LEE, 2004). On the other hand, the design variables could be the cross-sectional area of the structural elements. In such situation, the variable may be represented by an infinite amount of positive values. When design variables are real or defined over a real range, they are called continuous and the problem is a Continuous Optimization (LUENBERGER, 1969). From Figure 5, A1, A2 and A3 could be continuous design variables to be optimized. Additionally, there are problems posed using both continuous and discrete variables. Such problems are called Mixed Variable Optimization.
3. Search space: this is the space that contains all the possible inputs to the objective function. In a steel frame optimization, the search space could be the W-shaped profiles from a standard specification.
2.1 Formulation of an optimization problem
Transcription of an optimization problem into a mathematical formulation is a critical step. If the problem formulation is improper, the solution for the problem is most likely going to be unacceptable (ARORA, 2007). The usual formulation of an optimization problem is as follows: 2.1. Formulation of an optimization problem 31
n Find a vector of design variables d ∈ R :
d = {d1, d2, ... , dn}, (2.1)
n m which minimize the objective function f : R × R → R:
fobj = f(d, x), (2.2) with parameters:
x = {x1, x2, ... , xm}, (2.3) subject to the constraints:
gi(d, x) ≤ 0; i = 1, ... , nic, (2.4)
hj(d, x) = 0; j = 1, ... , nec, (2.5) where gi(d, x) and hj(d, x) represent the functions that establish inequal- ity and equality constraints, respectively, while nic and nec represent the number of such functions. In the frame example parameters could be, for instance, the steel elastic modulus (E1, E2, E3) or the external loads applied (Q1, Q2). Con- straints on the objective function could come from stress restrictions on each member, based on a design code, or a limited lateral displacement. They could also be imposed on the design variables, for example, by 2 limiting the cross-sectional area to a certain range, e.g. 1 ≤ Ai ≤ 5 cm for i={1,2,3} in Figure 5. When seeking to minimize a function it may occur that there are no imposed restrictions. In this case, the problem is classified as an Unconstrained Optimization problem. Otherwise, if restrictions exist on the objective function or on the design variables, the problem is from the Constrained Optimization class. It is assumed that the objective function is to be minimized, but that entails no loss of generality since the minimum of –f(d, x) occurs where the maximum of f(d, x) takes place. 32 Chapter 2. Optimization
Further considerations on the terminology and aspects of this formulation will be addressed in section 2.2.
2.2 Aspects and classification of optimization problems
This section details aspects of the optimization problem that lead to different classification and usually different solution procedures.
2.2.1 Convexity
Convexity in an important characteristic of an optimization prob- lem. Knowing beforehand that a function is convex enables the use of specialized methods that exploit this property. Thus, the search becomes faster if compared to more general methods and the solution is guaran- teed to be a global optimum as shall be discussed in subsection 2.2.2. In order to define a convex function, the definition of a convex set is required. A set S is said to be convex if given any two points p1 and p2 in S, the line segment p1p2 is also in S. This concept can be extended for the n-dimensional space. Mathematically, a parametric representation of a line segment between points d(1) and d(2) can be formulated as follows:
d = αd(2) + (1 – α)d(1); 0 ≤ α ≤ 1. (2.6)
If the entire line segment is in S, then it is a convex set (ARORA, 2004). Considering now a function f(d) defined on a convex set S, this function is said convex if it satisfies:
f(αd(2) + (1 – α)d(1)) ≤ αf(d(2)) + (1 – α)f(d(1)); 0 ≤ α ≤ 1. (2.7)
This condition is necessary and sufficient and applies to n-dimensional functions. An illustrative example of a single variable convex function is shown in Figure 6. In practice, applying Equation 2.7 is difficult, as an infinite number of pair of points must be checked. Checking the Hessian of the 2.2. Aspects and classification of optimization problems 33
f(d) f f (2) (1) (d )+(1 – α) (d ) f(d(2))
f(d(1)) f(d)
d (2) (1) d d + ) (2) d d(1) d Figure 6 – Convexity check on unidimensionald function Source – Adapted from Arora(2004) function is a simpler alternative. A function is said to be convex when its Hessian ∇2f(d), defined as:
2 2 2 ∂ f(d) ∂ f(d) ∂ f(d) 2 ... ∂d ∂d1∂d2 ∂d1∂dn 1 ∂2f(d) ∂2f(d) ∂2f(d) ... ∂ ∂ 2 ∂ ∂ 2 d2 d1 ∂d2 d2 dn ∇ f(d) := , (2.8) . . . . . .. . . . . . 2 2 2 ∂ f(d) ∂ f(d) ∂ f(d) ... 2 ∂dn∂d1 ∂dn∂d2 ∂dn
is at least positive semidefinite everywhere, that is, has non-negative 34 Chapter 2. Optimization eigenvalues for all points in S. Thus, a Convex Optimization problem is one where S, f and the inequality constraints gi are convex and the equality constraints hj are linear. This type of problem has an important characteristic that will be discussed in subsection 2.2.2.
2.2.2 Local or global optima
The distinction between global and local optima is relevant when considering non-convex functions that arise in practical engineering optimization. A local optimum is defined as a feasible point dˆ from the search space S such that sufficiently small neighborhoods surrounding dˆ contain no points that are both feasible and improving in objective function value (ZABINSKY, 2003). When a point has the best objective value over all the search space, then it is referred to as global optimum. Figure 7 illustrates this concept using a one-dimensional function. This function is multimodal, that is, presents multiple local optima. f(d)
local optima
global optimum
d
Figure 7 – Local and global minima 2.3. Aspects and classification of optimization algorithms 35
Mathematically, considering a minimization problem, it can be stated: A point dˆ ∈ S is a local optimum (minimum) of f over S if
∃δ > 0 : f(dˆ) ≤ f(d) and ||d – dˆ|| < δ, ∀d ∈ S, (2.9) and also: A point dˆ ∈ S is a global optimum (minimum) of f over S if
f(dˆ) ≤ f(d) ∀d ∈ S. (2.10)
When trying to solve an optimization problem, one is usually interested in finding the global optimum, i.e. the best possible result. Often, a question arises if the solution found is indeed a global minimum. Considering Convex Optimization problems, a fundamental property is that any locally optimal point is also globally optimal. This property makes Convex Optimization problems somewhat easier to analyze, as there is a guarantee of global optimality (LUENBERGER, 1969). The other option to ensure global optimality, which is usually intractable, is an exhaustive search on the search space. This means that unless the problem can be shown as convex, there is no way to recognize if the solution found can be further improved. Several functions in engineering optimization are non-convex and thus require special care when using optimization algorithms in order to avoid being trapped in local solutions. This distinctive behavior among algorithms is further discussed in section 2.3.
2.3 Aspects and classification of optimization algorithms
2.3.1 Algorithmic progression
Depending on the existence of random components in the opti- mization procedure, the algorithmic progression behaves differently. The algorithms can be classified in this regard as deterministic or stochastic: 36 Chapter 2. Optimization
• Deterministic: Classical mathematical optimization algorithms are usually deterministic. They iteratively improve the solution according to some deterministic rule;
• Stochastic: Algorithms of this category possesses a random com- ponent in its formulation. Contrary to deterministic algorithms, running the same optimization process multiple times may lead to different results.
2.3.2 Local Search
Local Search methods are the ones that converge to a local op- timum. Most mathematical optimization algorithms from linear and non-linear programming fall in this category. They usually focus on iden- tifying and iteratively following a descent direction. To accomplish this, they make use of the information from function evaluations, gradients and/or the Hessian. Some examples include (NOCEDAL; WRIGHT, 2006):
• Line Searches;
• Steepest Descent;
• Conjugate Gradients;
• Simplex Methods;
• Newton Methods;
• Quasi-Newton Methods;
• Sequential Quadratic Programming (SQP);
• Interior Point Methods. 2.3. Aspects and classification of optimization algorithms 37
2.3.3 Global optimization
Global optimization is distinguished from local optimization by its focus on finding the maximum or minimum over all the search space. When the objective function is convex, employing a local search such as those cited in subsection 2.3.2 yields the global optimum as discussed earlier. Moreover, this class of algorithms usually converges very fast. On the other hand, when the function is non-convex, the solutions from this search procedure may be local optima. The final solution becomes dependent on the point where the search started. In order to address this issue and still use a local search for its fast convergence a restart procedure may be employed. The idea is to assume that the global optimum has been found when a minimum value is achieved from multiple runs and starting points (MUSELLI, 1997; TORII et al., 2011). Another approach to solve this type of problem is the use of metaheuristics. Metaheuristics are a class of algorithms well suited to search for both local and global optima. They are mostly stochastic and usually do not use gradients information. They also do not require the function to be continuous or differentiable. A solution is found by applying a set of rules and randomness. Nevertheless, convergence proof only exists in probabilistic terms, i.e., as the number of iterations tends to infinity the minimum found so far tends to the global minimum. They also are usually computationally costlier, i.e., need a larger number of function evaluations to converge, if compared to local searches or the methods that will be discussed in chapter 4(YOUNIS; DONG, 2010). Still, reasonably good results can be achieved given enough time to run. Some algorithms include:
• Genetic Algorithms (GA) (GOLDBERG, 1989);
• Simulated Annealing (SA) (KIRKPATRICK et al., 1983);
• Particle Swarm Optimization (PSO) (KENNEDY; EBERHART, 1995);
• Ant Colony Optimization (ACO) (COLORNI et al., 1992); 38 Chapter 2. Optimization
• Search Group Algorithm (SGA) (GONÇALVES et al., 2015);
• Imperialist Competitive Algorithm (ICA) (ATASHPAZ-GARGARI; LUCAS, 2007; CARLON et al., 2015)
• Backtracking Search Algorithm (BSA) (CIVICIOGLU, 2013);
• Probabilistic Restart (PR) (LUERSEN; RICHE, 2004).
In this study, optimization takes place considering a non-convex, multimodal and expensive to evaluate black box function. By black box, it is meant that no information, other than the function response is available. This, therefore, rules out various local search techniques. Being multimodal also makes the use of local search improper. Metaheuristics could be a useful procedure, but given the fact that the function is expensive to evaluate, the process might become too time-consuming. A different approach is needed for solving this type of problem. The approach considered in this study is metamodeling. The metamodeling approach consists on developing fast surro- gate models for the objective and constraint functions. The models offer a cheap to compute transfer function between input and output and no gradient information is required. They are especially useful for complex, time-consuming simulations, and might be used in an opti- mization context. The structure of the model and its formulation may be exploited focusing specifically the search for the global optimum. It makes this approach the best candidate to solve the type of problem proposed. Further considerations on surrogate modeling and the use of this methodology for global optimization will be discussed in chapter 4. 39 3 Probability theory and uncer- tainties
For now, all components of the optimization problem were taken as deterministic. This does not map well with reality, due to the existence of uncertainties. In the next few sections, concepts of probability, stochastic pro- cesses and reliability will be reviewed. Later in this chapter, the optimiza- tion problem will be reformulated by considering those uncertainties, under different approaches.
3.1 Initial probability concepts
3.1.1 Events
An event E is defined as a subset of the sample space Ω. The sample space contains all possible outcomes of a random quantity. The failure event E of a structural element, for example, may be modeled by the event: E = R – S < 0, (3.1) where R represents resistance and S represents solicitation. Considering the frame example from chapter 2, another event could be one where the elastic modulus value E1, of the first steel element, is between 205 and 210 GPa. For every event considered, a probability of occurrence can be associated. This will be discussed in subsection 3.1.2.
3.1.2 Axiomatic probability measure
For the classical probability definition, it is necessary to under- stand the mathematical concept of σ-algebra. A σ-algebra is a subset 40 Chapter 3. Probability theory and uncertainties from a set which has some special properties such as having an empty element and being closed under union and complementation. With such structure, it is possible to represent events and a probability measure. Given a sample space Ω and an associated σ-algebra F. The probability measure is a function P : F → [0, 1], which follows the Kolmogorov axioms (KOLMOGOROV, 1950; SUDRET, 2007):
P(A) ≥ 0, ∀A ∈ F (3.2) P(Ω) = 1 (3.3) P(A ∪ B) = P(A) + P(B), ∀A, B ∈ F, A ∩ B = ∅. (3.4)
The probability measures how probable an event occurrence is. An impossible event will have a probability of zero, while an event that always occurs will have unit probability. From these axioms, it is also possible to compute the probability of unions and/or intersections of multiple events.
3.1.3 Frequentist interpretation of probability
The axiomatic construction proposed in subsection 3.1.2 is purely mathematical. On the other hand, when considering real world experi- ments, a more practical view relates to the evaluation of scenarios. Under the frequentist interpretation of probability theory, the probability of an event is the limit of its empirical frequency. This frequency is calculated as the number of occurrences (nA) of an event A divided by the number of trials n:
nA Freq(A) = ; (3.5) n Thus, the probability of event A becomes:
nA P(A) = lim . (3.6) n→∞ n 3.2. Density Functions 41
3.1.4 Random variables
Random variables are a useful tool to characterize quantities subjected to random variations. Throughout the text, they will be denoted by capital letters. A real-valued random variable X(ω) for ω ∈ Ω is defined as a mapping X : Ω → R. In other words, it is a function that attributes to every sample point ω from the sample space Ω a real value. The sample space Ω can be finite or countable infinite, which results in a discrete random variable. When Ω possess an uncountable number of elements, the resulting random variable is called continuous. A simple illustrative example of a random variable could be the model of fair coin toss. Calling it C, the random variable has two states: 1, if the outcome is heads; C = (3.7) 0, if the outcome is tails. Clearly, the variable is discrete as Ω = {0, 1} and the event of a heads outcome can be written with the following notation: