Design of Computer Experiments: Space Filling and Beyond Luc Pronzato, Werner Müller
Total Page:16
File Type:pdf, Size:1020Kb
Design of computer experiments: space filling and beyond Luc Pronzato, Werner Müller To cite this version: Luc Pronzato, Werner Müller. Design of computer experiments: space filling and beyond. Statistics and Computing, Springer Verlag (Germany), 2012, 22 (3), pp.681-701. 10.1007/s11222-011-9242-3. hal-00685876 HAL Id: hal-00685876 https://hal.archives-ouvertes.fr/hal-00685876 Submitted on 6 Apr 2012 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Statistics and Computing manuscript No. (will be inserted by the editor) Design of computer experiments: space ¯lling and beyond Luc Pronzato ¢ Werner G. MÄuller January 28, 2011 Abstract When setting up a computer experiment, it 1 Introduction has become a standard practice to select the inputs spread out uniformly across the available space. These Computer simulation experiments (see, e.g., Santner so-called space-¯lling designs are now ubiquitous in cor- et al (2003); Fang et al (2005); Kleijnen (2009)) have responding publications and conferences. The statisti- now become a popular substitute for real experiments cal folklore is that such designs have superior properties when the latter are infeasible or too costly. In these when it comes to prediction and estimation of emula- experiments, a deterministic computer code, the sim- tor functions. In this paper we want to review the cir- ulator, replaces the real (stochastic) data generating cumstances under which this superiority holds, provide process. This practice has generated a wealth of statis- some new arguments and clarify the motives to go be- tical questions, such as how well the simulator is able yond space-¯lling. An overview over the state of the art to mimic reality or which estimators are most suitable of space-¯lling is introducing and complementing these to adequately represent a system. results. However, the foremost issue presents itself even be- fore the experiment is started, namely how to deter- mine the inputs for which the simulator is run? It has become standard practice to select these inputs such as Keywords Kriging ¢ entropy ¢ design of experiments ¢ to cover the available space as uniformly as possible, space-¯lling ¢ sphere packing ¢ maximin design ¢ thus generating so called space-¯lling experimental de- minimax design signs. Naturally, in dimensions greater than one there are alternative ways to produce such designs. We will therefore in the next sections (2,3) briefly review the most common approaches to space-¯lling design, tak- This work was partially supported by a PHC Amadeus/OEAD ing a purely model-free stance. We will then (Sect. 4) Amad¶eegrant FR11/2010. investigate how these designs can be motivated from L. Pronzato a statistical modelers point of view and relate them to Laboratoire I3S, Universit¶ede Nice-Sophia Antipolis/CNRS each other in a meaningful way. Eventually we will show b^atiment Euclide, les Algorithmes that taking statistical modeling seriously will lead us 2000 route des lucioles, BP 121 06903, Sophia Antipolis cedex, France to designs that go beyond space-¯lling (Sect. 5 and 6). Tel.: +33-4-92942703 Special attention is devoted to Gaussian process models Fax: +33-4-92942896 and kriging. The only design objective considered corre- E-mail: [email protected] sponds to reproducing the behavior of a computer code Werner G. MÄuller over a given domain for its input variables. Some basic Department of Applied Statistics, Johannes-Kepler-University principles about algorithmic constructions are exposed Linz FreistÄadterStra¼e 315, A-4040 Linz, Austria Tel.: +43-732-24685880 in Sect. 7 and Sect. 8 briefly concludes. Fax: +43-732-24689846 The present paper can be understood as a survey E-mail: [email protected] focussing on the special role of space-¯lling designs and 2 at the same time providing new illuminative aspects. It intends to bring the respective sections of Koehler and Owen (1996) up to date and to provide a more statistical point of view than Chen et al (2006). 2 State of the art on space-¯lling design 2.1 Geometric criteria There is little ambiguity on what constitutes a space- Fig. 1 Maximin (left, see http://www.packomania.com/ and minimax (right, see Johnson et al (1990)) distance designs for 2 ¯lling design in one dimension. If we de¯ne an exact n=7 points in [0; 1] . The circles have radius ÁMm(»)=2 on the design » = (x1; : : : ; xn) as a collection of n points and left panel and radius ÁmM (») on the right one. consider a section of the real line as the design space, say X = [0; 1] after suitable renormalization, then, de- pending upon whether we are willing to exploit the spheres X, see Melissen (1997, p. 78). The literature on edges or not, we have either xi = (i ¡ 1)=(n ¡ 1) or sphere packing is rather abundant. In dimension d = 2, xi = (2i ¡ 1)=(2n) respectively. the best known results up to n = 10 000 for ¯nding The distinction between those two basic cases comes the maximum common radius of n circles which can from the fact that one may consider distances only be packed in a square are presented on http://www. amongst points in the design » or to all points in the set packomania.com/ (the example on Fig. 1{left is taken X. We can carry over this notion to the less straight- from there, with ÁMm(») ' 0:5359, indicating that the forward higher dimensional case d > 1, with now » = 7-point design in (Johnson et al, 1990) is not a maximin- (x1;:::; xn). Initially we need to de¯ne a proper norm distance design); one may refer to (Gensane, 2004) for k:k on X = [0; 1]d, Euclidean distances and normaliza- best-known results up to n = 32 for d = 3. tion of the design space will not impede generality for Among the set of maximin-distance designs (when ¤ our purposes. We shall denote there exist several), a maximin-optimal design »Mm is such that the number of pairs of points (xi; xj) at the dij = kxi ¡ xjk ¤ distance dij = ÁMm(»Mm) is minimum (several such designs can exist, and measures can be taken to remove the distance between the two design points xi and xj of ». We shall not consider the case where there exist draws, see Morris and Mitchell (1995), but this is not constraints that make only a subset of [0; 1]d admissi- important for our purpose). ble for design, see for instance Stinstra et al (2003) for Consider now designs » that attempt to make the possible remedies; the construction of Latin hypercube maximum distance from all the points in X to their designs (see Sect. 2.2) with constraints is considered in closest point in » as small as possible. This is achieved (Petelet et al, 2010). by minimizing the minimax-distance criterion Let us ¯rst seek for a design that wants to achieve a high spread solely amongst its support points within ÁmM (») = max min kx ¡ xik : the design region. One must then attempt to make x2X xi the smallest distance between neighboring points in » as large as possible. That is ensured by the maximin- We call a design that minimizes Á (¢) a minimax- distance criterion (to be maximized) mM distance design, see Johnson et al (1990) and Fig. 1{ ÁMm(») = min dij : right for an example. (Note the slight confusion in ter- i6=j minology as it is actually minimaximin.) These designs We call a design that maximizes ÁMm(¢) a maximin- can be motivated by a table allocation problem in a distance design, see Johnson et al (1990). An example restaurant, such that a waiter is as close as possible to is given in Fig. 1{left. This design can be motivated a table wherever he is in the restaurant. by setting up the tables in a restaurant such that one In other terms, one wishes to cover X with n balls wants to minimize the chances to eavesdrop on another of minimum radius. Among the set of minimax-distance party's dinner talk. designs (in case several exist), a minimax-optimal de- ¤ In other terms, one wishes to maximize the radius sign »mM maximizes the minimum number of xi's such ¤ of n non-intersecting balls with centers in X. When X is that mini kx ¡ xik = ÁmM (»mM ) over all points x hav- a d-dimensional cube, this is equivalent to packing rigid ing this property. 3 2.2 Latin hypercubes ¤ Note that pure space-¯lling designs such as »mM and ¤ »Mm may have very poor projectional properties; that is, they may be not space-¯lling on any of their mean- ingful subspaces, see Fig. 1. The opposite is desirable for computer experiments, particularly when some inputs are of no influence in the experiment, and this property was called noncollapsingness by some authors (cf. Stin- stra et al (2003)). This requirement about projections Fig. 2 Minimax-Lh and simultaneously maximin-Lh dis- 2 is one of the reasons that researches have started to tance design for n=7 points in [0; 1] , see http://www. spacefillingdesigns.nl/. The circles have radius ÁMm(»)=2 on restrict the search for designs to the class of so-called the left panel and radius ÁmM (») on the right one.