Improving Kriging Surrogates of High-Dimensional Design Models By
Total Page:16
File Type:pdf, Size:1020Kb
Improving kriging surrogates of high-dimensional design models by Partial Least Squares dimension reduction Mohamed-Amine Bouhlel, Nathalie Bartoli, Abdelkader Otsmane, Joseph Morlier To cite this version: Mohamed-Amine Bouhlel, Nathalie Bartoli, Abdelkader Otsmane, Joseph Morlier. Improving kriging surrogates of high-dimensional design models by Partial Least Squares dimension reduction. 2015. hal-01232938 HAL Id: hal-01232938 https://hal.archives-ouvertes.fr/hal-01232938 Preprint submitted on 24 Nov 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Structural and Multidisciplinary Optimization manuscript No. (will be inserted by the editor) Mohamed Amine Bouhlel · Nathalie Bartoli · Abdelkader Otsmane · Joseph Morlier Improving kriging surrogates of high-dimensional design models by Partial Least Squares dimension reduction Received: date / Revised: date Abstract Engineering computer codes are often compu- Symbols and notation tationally expensive. To lighten this load, we exploit new covariance kernels to replace computationally expensive Matrices and vectors are in bold type. codes with surrogate models. For input spaces with large dimensions, using the Kriging model in the standard way is computationally expensive because a large covariance Symbol Meaning matrix must be inverted several times to estimate the pa- det determinant of a matrix rameters of the model. We address this issue herein by j · j absolute value constructing a covariance kernel that depends on only R set of real numbers + a few parameters. The new kernel is constructed based R set of positive real numbers on information obtained from the Partial Least Squares n number of sampling points method. Promising results are obtained for numerical ex- d dimensions h number of principal components retained amples with up to 100 dimensions, and significant com- x 1 × d vector th putational gain is obtained while maintaining sufficient xj j element of a vector x accuracy. X n × d matrix containing sampling points y n × 1 vector containing simulation of X x(i) ith training point for i = 1; : : : ; n (a 1 × d vector) Keywords Kriging · Partial Least Squares · Experi- w(l) d × 1 vector containing X weights given by ment design · Metamodels the lth PLS iteration for l = 1; : : : ; h X(0) X X(l−1) Matrix containing residual of inner regression of (l − 1)st PLS iteration for M. A. Bouhlel l = 1; : : : ; h SNECMA, Rond-point Ren´eRavaud-R´eau,77550 Moissy- k(·; ·) covariance function Cramayel, France N (0; k(·; ·)) Distribution of a Gaussian process with Tel.: +33-(0)5-62252938 mean function 0 and covariance function E-mail: [email protected] k(·; ·) E-mail: [email protected] xt Superscript t denotes the transpose operation of the vector x N. Bartoli ONERA, 2 avenue Edouard´ Belin, 31055 Toulouse, France Tel.: +33-(0)5-62252644 E-mail: [email protected] 1 Introduction and main contribution J. Morlier Universit´ede Toulouse, Institut Cl´ement Ader, ISAE, 10 Av- In recent decades, because simulation models have striven enue Edouard Belin, 31055 Toulouse Cedex 4, France to more accurately represent the true physics of phe- Tel.: +33-(0)5-61338131 E-mail: [email protected] nomena, computational tools in engineering have become ever more complex and computationally expensive. To A. Otsmane SNECMA, Rond-point Ren´eRavaud-R´eau,77550 Moissy- address this new challenge, a large number of input de- Cramayel, France sign variables, such as geometric representation, are of- Tel.: +33-(0)1-60599887 ten considered. Thus, to analyze the sensitivity of input E-mail: [email protected] design variables (this is called a \sensitivity analysis") 2 or to search for the best point of a physical objective dependencies between inputs. Information given by PLS under certain physical constraints (i.e., global optimiza- is integrated in the covariance structure of the kriging tion), a large number of computing iterations are re- model to reduce the number of hyper-parameters. The quired, which is impractical when using simulations in combination of kriging and PLS is abbreviated KPLS real time. This is the main reason that surrogate model- and allows us to build a fast kriging model because it ing techniques have been growing in popularity in recent requires fewer hyper-parameters in its covariance func- years. Surrogate models, also called metamodels, are vi- tion; all without eliminating any input variables from the tal in this context and are widely used as substitutes for original problem. time-consuming high-fidelity models. They are mathe- The KPLS methods is used for many academic and matical tools that approximate coded simulations of a industrial verifications, and promising results have been few well-chosen experiments that serve as models for the obtained for problems with up to 100 dimensions. The design of experiments. The main role of surrogate mod- cases used in this paper do not exceed 100 input vari- els is to describe the underlying physics of the phenom- ables, which should be quite sufficient for most engineer- ena in question. Different types of surrogate models can ing problems. Problems with more than 100 inputs may be found in the literature, such as regression, smooth- lead to memory difficulties with the toolbox Scikit-learn ing spline [Wahba and Craven (1978); Wahba (1990)], (version 0.14), on which the KPLS method is based. neural networks [Haykin (1998)], radial basis functions This paper is organized as follows: Section 2 summa- [Buhmann (2003)] and Gaussian-process modeling [Ras- rizes the theoretical basis of the universal kriging model, mussen and Williams (2006)]. recalling the key equations. The proposed KPLS model is In this article, we focus on the kriging model because then described in detail in section 3 by using the kriging it estimates the prediction error. This model is also re- equations. Section 4 compares and analyzes the results ferred to as the Gaussian-process model [Rasmussen and of the KPLS model with those of the kriging model when Williams (2006)] and was presented first in geostatis- applied to classic analytical examples and some complex tics [see, e.g., Cressie (1988) or Goovaerts (1997)] be- engineering examples. Finally, section 5 concludes and fore being extended to computer experiments and ma- gives some perspectives. chine learning [Schonlau (1998); Sasena (2002); Jones et al (1998); Picheny et al (2010)]. The kriging model has become increasingly popular due to its flexibility 2 Universal kriging model in accurately imitating the dynamics of computationally expensive simulations and its ability to estimate the er- To understand the mathematics of the proposed meth- ror of the predictor. However, it suffers from some well- ods, we first review the kriging equations. The objective known drawbacks in high dimension, which may be due is to introduce the notation and to briefly describe the to multiple causes. For starters, the size of the covariance theory behind the kriging model. Assume that we have matrix of the kriging model may increase dramatically evaluated a cost deterministic function of n points x(i) if the model requires a large number of sample points. (i = 1; : : : ; n) with As a result, inverting the covariance matrix is compu- (i) h (i) (i)i d tationally expensive. The second drawback is the opti- x = x1 ;:::; xd 2 B ⊂ R ; mization of the subproblem, which involves estimating the hyper-parameters for the covariance matrix. This is and we denote X by the matrix [x(1)t;:::; x(n)t]t. For a complex problem that requires inverting the covariance simplicity, B is considered to be a hypercube expressed matrix several times. Some recent works have addressed by the product between intervals of each direction space, the drawbacks of high-dimensional Gaussian processes Qd i.e., B = j=1[aj; bj], where aj; bj 2 R with aj ≤ bj for [Hensman et al (2013); Damianou and Lawrence (2013); j = 1; : : : ; d. Simulating these n inputs gives the outputs Durrande et al (2012)] or the large-scale sampling of data y = [y(1); : : : ; y(n)]t with y(i) = y(x(i)) for i = 1; : : : ; n. [Sakata et al (2004)]. One way to reduce CPU time when We usey ^(x) to denote the prediction of the true function constructing a kriging model is to reduce the number of y(x) which is considered as a realization of a stochastic hyper-parameters, but this approach assumes that the process Y (x) for all x 2 B. For the universal kriging kriging model exhibits the same dynamics in all direc- model [Roustant et al (2012); Picheny et al (2010)], Y is tions [Mera (2007)]. written as Thus, because estimating the kriging parameters can m be time consuming, especially with dimensions as large X Y (x) = βjfj(x) + Z(x); (1) as 100, we present herein a new method that combines j=1 the kriging model with the Partial Least Squares (PLS) technique to obtain a fast predictor. Like the method of where, for j = 1; : : : ; m, fj is a known independent ba- principle components analysis (PCA), the PLS technique sis function, βj 2 R is an unknown parameter, and Z reduces dimension and reveals how inputs depend on out- is a Gaussian process defined by Z(x) ∼ N (0; k), with puts. PLS is used in this work because PCA only exposes k being a stationary covariance function, also called a 3 covariance kernel.