Design of computer experiments: space filling and beyond Luc Pronzato, Werner Müller

To cite this version:

Luc Pronzato, Werner Müller. Design of computer experiments: space filling and beyond. and Computing, Springer Verlag (Germany), 2012, 22 (3), pp.681-701. ￿10.1007/s11222-011-9242-3￿. ￿hal-00685876￿

HAL Id: hal-00685876 https://hal.archives-ouvertes.fr/hal-00685876 Submitted on 6 Apr 2012

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Statistics and Computing manuscript No. (will be inserted by the editor)

Design of computer experiments: space filling and beyond

Luc Pronzato · Werner G. M¨uller

January 28, 2011

Abstract When setting up a computer experiment, it 1 Introduction has become a standard practice to select the inputs spread out uniformly across the available space. These Computer experiments (see, e.g., Santner so-called space-filling designs are now ubiquitous in cor- et al (2003); Fang et al (2005); Kleijnen (2009)) have responding publications and conferences. The statisti- now become a popular substitute for real experiments cal folklore is that such designs have superior properties when the latter are infeasible or too costly. In these when it comes to prediction and estimation of emula- experiments, a deterministic computer code, the sim- tor functions. In this paper we want to review the cir- ulator, replaces the real (stochastic) data generating cumstances under which this superiority holds, provide process. This practice has generated a wealth of statis- some new arguments and clarify the motives to go be- tical questions, such as how well the simulator is able yond space-filling. An overview over the state of the art to mimic reality or which estimators are most suitable of space-filling is introducing and complementing these to adequately represent a system. results. However, the foremost issue presents itself even be- fore the experiment is started, namely how to deter- mine the inputs for which the simulator is run? It has become standard practice to select these inputs such as Keywords · entropy · · to cover the available space as uniformly as possible, space-filling · sphere packing · maximin design · thus generating so called space-filling experimental de- minimax design signs. Naturally, in dimensions greater than one there are alternative ways to produce such designs. We will therefore in the next sections (2,3) briefly review the most common approaches to space-filling design, tak- This work was partially supported by a PHC Amadeus/OEAD ing a purely model-free stance. We will then (Sect. 4) Amad´eegrant FR11/2010. investigate how these designs can be motivated from L. Pronzato a statistical modelers point of view and relate them to Laboratoire I3S, Universit´ede Nice-Sophia Antipolis/CNRS each other in a meaningful way. Eventually we will show bˆatiment Euclide, les Algorithmes that taking statistical modeling seriously will lead us 2000 route des lucioles, BP 121 06903, Sophia Antipolis cedex, France to designs that go beyond space-filling (Sect. 5 and 6). Tel.: +33-4-92942703 Special attention is devoted to Gaussian process models Fax: +33-4-92942896 and kriging. The only design objective considered corre- E-mail: [email protected] sponds to reproducing the behavior of a computer code Werner G. M¨uller over a given domain for its input variables. Some basic Department of Applied Statistics, Johannes-Kepler-University principles about algorithmic constructions are exposed Linz Freist¨adterStraße 315, A-4040 Linz, Austria Tel.: +43-732-24685880 in Sect. 7 and Sect. 8 briefly concludes. Fax: +43-732-24689846 The present paper can be understood as a survey E-mail: [email protected] focussing on the special role of space-filling designs and 2 at the same time providing new illuminative aspects. It intends to bring the respective sections of Koehler and Owen (1996) up to date and to provide a more statistical point of view than Chen et al (2006).

2 State of the art on space-filling design

2.1 Geometric criteria

There is little ambiguity on what constitutes a space- Fig. 1 Maximin (left, see http://www.packomania.com/ and minimax (right, see Johnson et al (1990)) distance designs for 2 filling design in one dimension. If we define an exact n=7 points in [0, 1] . The circles have radius φMm(ξ)/2 on the design ξ = (x1, . . . , xn) as a collection of n points and left panel and radius φmM (ξ) on the right one. consider a section of the real line as the design space, say X = [0, 1] after suitable renormalization, then, de- pending upon whether we are willing to exploit the spheres X, see Melissen (1997, p. 78). The literature on edges or not, we have either xi = (i − 1)/(n − 1) or sphere packing is rather abundant. In dimension d = 2, xi = (2i − 1)/(2n) respectively. the best known results up to n = 10 000 for finding The distinction between those two basic cases comes the maximum common radius of n circles which can from the fact that one may consider distances only be packed in a square are presented on http://www. amongst points in the design ξ or to all points in the set packomania.com/ (the example on Fig. 1–left is taken X. We can carry over this notion to the less straight- from there, with φMm(ξ) ' 0.5359, indicating that the forward higher dimensional case d > 1, with now ξ = 7-point design in (Johnson et al, 1990) is not a maximin- (x1,..., xn). Initially we need to define a proper norm distance design); one may refer to (Gensane, 2004) for k.k on X = [0, 1]d, Euclidean distances and normaliza- best-known results up to n = 32 for d = 3. tion of the design space will not impede generality for Among the set of maximin-distance designs (when ∗ our purposes. We shall denote there exist several), a maximin- ξMm is such that the number of pairs of points (xi, xj) at the dij = kxi − xjk ∗ distance dij = φMm(ξMm) is minimum (several such designs can exist, and measures can be taken to remove the distance between the two design points xi and xj of ξ. We shall not consider the case where there exist draws, see Morris and Mitchell (1995), but this is not constraints that make only a subset of [0, 1]d admissi- important for our purpose). ble for design, see for instance Stinstra et al (2003) for Consider now designs ξ that attempt to make the possible remedies; the construction of Latin hypercube maximum distance from all the points in X to their designs (see Sect. 2.2) with constraints is considered in closest point in ξ as small as possible. This is achieved (Petelet et al, 2010). by minimizing the minimax-distance criterion Let us first seek for a design that wants to achieve a high spread solely amongst its support points within φmM (ξ) = max min kx − xik . the design region. One must then attempt to make x∈X xi the smallest distance between neighboring points in ξ as large as possible. That is ensured by the maximin- We call a design that minimizes φ (·) a minimax- distance criterion (to be maximized) mM distance design, see Johnson et al (1990) and Fig. 1–

φMm(ξ) = min dij . right for an example. (Note the slight confusion in ter- i6=j minology as it is actually minimaximin.) These designs

We call a design that maximizes φMm(·) a maximin- can be motivated by a table allocation problem in a distance design, see Johnson et al (1990). An example restaurant, such that a waiter is as close as possible to is given in Fig. 1–left. This design can be motivated a table wherever he is in the restaurant. by setting up the tables in a restaurant such that one In other terms, one wishes to cover X with n balls wants to minimize the chances to eavesdrop on another of minimum radius. Among the set of minimax-distance party’s dinner talk. designs (in case several exist), a minimax-optimal de- ∗ In other terms, one wishes to maximize the radius sign ξmM maximizes the minimum number of xi’s such ∗ of n non-intersecting balls with centers in X. When X is that mini kx − xik = φmM (ξmM ) over all points x hav- a d-dimensional cube, this is equivalent to packing rigid ing this property. 3

2.2 Latin hypercubes

∗ Note that pure space-filling designs such as ξmM and ∗ ξMm may have very poor projectional properties; that is, they may be not space-filling on any of their mean- ingful subspaces, see Fig. 1. The opposite is desirable for computer experiments, particularly when some inputs are of no influence in the experiment, and this property was called noncollapsingness by some authors (cf. Stin- stra et al (2003)). This requirement about projections Fig. 2 Minimax-Lh and simultaneously maximin-Lh dis- 2 is one of the reasons that researches have started to tance design for n=7 points in [0, 1] , see http://www. spacefillingdesigns.nl/. The circles have radius φMm(ξ)/2 on restrict the search for designs to the class of so-called the left panel and radius φmM (ξ) on the right one. Latin-hypercube (Lh) designs, see McKay et al (1979), which have the property that any of their one-dimensio- nal projections yields the maximin distance sequence model-based (as discussed in Sect. 4) examples was demonstrated in (Pebesma and Heuvelink, 1999). The xi = (i − 1)/(n − 1). An additional advantage is that since the generation of Lh-designs as a finite class is algorithmic construction of Lh designs that optimize computationally rather simple, it has become custom- a discrepancy criterion (see Sect. 2.3) or an entropy ary to apply a secondary, e.g. space-filling, criterion based criterion (see Sect. 3.3) is considered respectively to them, sometimes by a mere brute-force enumera- in (Iooss et al, 2010) and (Jourdan and Franco, 2010); tion as in (van Dam, 2007). An example of minimax the algebraic construction of Lh designs that minimize and simultaneously maximin Lh design is presented in the integrated kriging variance for the particular cor- Fig. 2 (note that there is a slight inconsistency about relation structure C(u, v; ν) = exp(−νku − vk1) (see minimax-Lh designs in that they are maximin rather Sect. 4.1) is considered in (Pistone and Vicario, 2010). than minimax on their one-dimensional projections). Other distances than Euclidean could be considered; 2.3 Other approaches to space-filling when working within the class of Lh designs the situa- tion is easier with the L1 or L∞ norms than with the There appear a number of alternative approaches to L2 norm, at least for d = 2, see van Dam et al (2007); space-filling in the literature, most of which can be sim- van Dam (2007). The class of Lh designs is finite but ilarly distinguished by the stochastic nature of the in- large. It contains (n!)d−1 different designs (not (n!)d puts, i.e. whether ξ is to be considered random or fixed. since the order of the points is arbitrary and the first co- For the latter, natural simple designs are regular ordinates can be fixed to {xi}1 = (i − 1)/(n − 1)), and grids. Such designs are well suited for determining ap- still (n!)d−1/(d−1)! if we consider designs as equivalent propriate model responses and for checking whether as- when they differ by a permutation of coordinates. An sumptions about the errors are reasonably well satis- exhaustive search is thus quickly prohibitive even for fied. There seems little by which to choose between e.g. moderate values of n and d. Most algorithmic methods a square grid or a triangular grid; it is worth noting, are of the exchange type, see Sect. 7. In order to re- however, that the former may be slightly more conve- main in the class of Lh designs, one exchange-step corre- nient from a practical standpoint (easier input determi- sponds to swapping the j-th coordinates of two points, nation) but that the latter seems marginally more effi- which gives (d − 1)n(n − 1)/2 possibilities at each step cient for purposes of model based prediction (cf. Yfantis (the first coordinates being fixed). Another approach et al (1987)). that takes projectional properties into account but is Bellhouse and Herzberg (1984) have compared op- not restricted to the class of Lh designs will be pre- timum designs and uniform grids (in a one-dimensional sented in Sect. 3.3. Note that originally McKay et al model based setup) and they come to the conclusion (1979) have introduced Lh designs as random sampling that (depending upon the model) predictions for cer- procedures rather than candidates for providing fixed tain output regions can actually be improved by reg- designs, those random designs being not guaranteed to ular grids. A comparison in a multi-dimensional setup have good space-filling properties. Tang (1993) has in- including correlations can be found in (Herzberg and troduced orthogonal-array-based Latin hypercubes to Huda, 1981). improve projections on higher dimensional subspaces, For higher dimensional problems, Bates et al (1996) the space-filling properties of which were improved by recommend the use of (non-rectangular) lattices rather Leary et al (2003). The usefulness of Lh designs in than grids (they also reveal connections to model-based 4 approaches). In the two-dimensional setup (on the unit (Santner et al, 2003, Chap. 5) shows that the measure square [−1, 1]2) the Fibonacci lattice (see Koehler and of uniformity expressed by D(ξ) is not always in agree- Owen (1996)) proved to be useful. The advantage of lat- ment with common intuition. tices is that their projection on lower dimensions covers Niederreiter (1992) has used similar concepts for the design region more or less uniformly. Adaptations the generation of so called low discrepancy sequences. to irregular design regions may not be straightforward, Originately devised for the use in Quasi Monte Carlo but good enough approximations will suffice. This is sampling, due to the Koksma-Hlawka inequality in nu- not the case for many other systematic designs that are merical integration, their elaborate versions, like Faure, frequently proposed in the literature, such as central Halton and Sobol sequences, are increasingly used in composite designs, the construction of which relies on computer experiments (see, e.g., Fang and Li (2006)). the symmetry of the design region. Santner et al (2003, Chap. 5) and Fang et al (2005, It is evident that randomization can be helpful for Chap. 3) provide a good overview of the various types making designs more robust. On a finite grid X with of the above discussed designs and their relations. Other N candidate points we can think of randomization as norms than k · k can be used in the definition of dis- ∞ ¡R ¢ drawing a single design ξ according to a pre-specified p 1/p crepancy, yielding Dp(ξ) = X |Fn(x) − U(x)| dx , probability distribution π(·). The uniform distribution and other types of discrepancy (centered, wrap-around) then corresponds to simple random sampling and more may also be considered. Low discrepancy sequences pre- refined schemes (e.g., stratified random sampling, see sent the advantage that they can be constructed se- Fedorov and Hackl (1997)), can be devised by altering quentially (which is not the case for Lh designs), al- π(·). A comparison between deterministic selection and though one should take care of the irregularity of dis- random sampling is hard to make, since for a finite sam- tributions, see Niederreiter (1992, Chap. 3) and Fang ple it is evident that for any single purpose it is possible et al (2000) (a conjecture in number theory states that to find a deterministic design that outperforms random d−1 D(ξn) ≥ cd[log(n)] /n for any sequence ξn with cd a sampling. Performance benchmarking for various space- constant depending on d). It seems, however, that de- filling designs can be found in (Johnson et al, 2008) and signs obtained by optimizing a geometric space-filling (Bursztyn and Steinberg, 2006). criterion are preferable for moderate values of n and All of the methods presented above seem to ensure that, for n large and d > 1, the space-filling properties a reasonable degree of overall coverage of the study of designs corresponding to low-discrepancy sequences area. However, there have been claims (see e.g. Fang may not be satisfactory (the points presenting some- and Wang (1993)), that the efficiency (with respect to times alignments along subspaces). Note that (Bischoff coverage) of these methods may be poor when the num- and Miller, 2006) and related work reveal (in the one- ber of design points is small. To allow for comparisons dimensional setup) relationships between uniform de- between designs in the above respect Fang (1980) (see signs and designs that reserve a portion of the observa- also Fang et al (2000)) introduced some formal criteria, tions for detecting lack-of-fit for various classical design amongst them the so-called discrepancy criteria.

D(ξ) = max |Fn(x) − U(x)| . (1) x∈X Here U(·) is the c.d.f. of the uniform distribution on 2.4 Some properties of maximin and minimax optimal X and Fn(·) denotes the empirical c.d.f. for ξ. The designs discrepancy by this definition is just the Kolmogorov- n Smirnov test statistic for the goodness-of-fit test for a Notice that for any design ξ, X ⊂ ∪i=1B(xi, φmM (ξ)), uniform distribution. Based upon this definition, Fang with B(x,R) the ball with center x and radius R. There- 1/d −1/d and Wang (1993) suggest to find ‘optimum’ designs of fore, φmM (ξ) > [vol(X)/(nVd)] = (nVd) , with d/2 given size n that minimize D(ξ), which they term the Vd = π /Γ (d/2 + 1) the volume of the d-dimensional U-criterion. For d = 1 and X = [0, 1], the minimax- unit ball. One may also notice that for any ξ, n ≥ 2, ∗ optimal design ξmM with xi = (2i − 1)/(2n) is optimal φmM (ξ) > φMm(ξ)/2 since X cannot be covered with ∗ for (1), with D(ξmM ) = 1/(2n). Note, however, that non-overlapping balls. A sort of reverse inequality holds D(ξ) ≥ 0.06 log(n)/n for any sequence of n points, see for maximin-optimal designs. Indeed, take a maximin- ∗ ∗ Niederreiter (1992, p. 24). It turns out that for cer- optimal design ξMm and suppose that φmM (ξMm) > ∗ ∗ tain choices of n lattice designs are U-optimum. Those φMm(ξMm). It means that there exists a x ∈ X such ∗ ∗ ∗ lattice designs are also D-optimum for some specific that mini kx − xik > φMm(ξMm). By substituting x ∗ ∗ Fourier regressions and this and other connections are for a xi in ξMm such that dij = φMm(ξMm) for some j, explored by Riccomagno et al (1997). An example in one can then either increase the value of φMm(·), or de- 5 crease the number of pairs of design points at distance 2 ∗ ∗ φMm(ξMm), which contradicts the optimality of ξMm. ∗ ∗ Therefore, φmM (ξMm) ≤ φMm(ξMm). 1.5 ∗ ∗ Both φMm(ξMm) and φmM (ξmM ) are non-increasing d functions of n when X = [0, 1] (there may be equality√ 1 ∗ for different values of n, for instance, φMm(ξMm) = 2 for n = 3, 4 and d = 3, see Gensane (2004)). This is 0.5 no longer true, however, when working in the class of Lh designs (see e.g. van Dam (2007) who shows that ∗ 0 φmM (ξmM ) is larger for n = 11 than for n = 12 when d = 2 for Lh designs).

The value of φMm(·) is easily computed for any de- −0.5 sign ξ, even when n and the dimension d get large, since we only need to calculate distances between n(n − 1)/2 −1 points in Rd. −1 −0.5 0 0.5 1 1.5 2 Fig. 3 Delaunay triangulation for a 5-point Lh design (squares), The evaluation of the criterion φmM (·) is more dif- the 8 candidate points for being solution of maxx∈X mini kx−xik ficult, which explains why a discretization of X is of- are indicated by dots. ten used in the literature. It amounts at approximating φ (ξ) by φ˜ (ξ) = max min kx − x k, with mM mM,N x∈XN i i Appendix A for the computation of the radius of the XN a finite grid of N points in X. Even so, the calcula- ˜ circumscribed sphere to a simplex). tion of φmM,N (ξ) quickly becomes cumbersome when N Efficient algorithms exist for the computation of De- increases (and N should increase fast with d to have a launay tessellations, see Okabe et al (1992); Boissonnat fine enough grid). It happens, however, that basic tools and Yvinec (1998); Cignoni et al (1998) and the refer- from computational geometry permit to reduce the cal- ences therein, which make the computation of φmM (ξ) culation of maxx∈X mini kx − xik to the evaluation of affordable for reasonable values of d and n (the number mini kzj − xik for a finite collection of points zj ∈ X, d of simplices in the Delaunay tessellation of M points provided that X is the d-dimensional cube [0, 1] . This in dimension d is bounded by O(M dd/2e)). Clearly, not does not seem to be much used and we detail the idea all 2dn symmetric points are useful in the construction, hereafter. leaving open the possibility to reduce the complexity of Consider the Delaunay tessellation of the points of calculations by using less than (2d + 1)n points. ξ, see, e.g., Okabe et al (1992); Boissonnat and Yvinec Fig. 3 presents the construction obtained for a 5- (1998). Each simplex has its d + 1 vertices at design point Latin-hypercube design in dimension 2: 33 trian- points in the tessellation and has the property that its gles are constructed, 11 centers of circumscribed circles circumscribed sphere does not contain any design point belong to X, with some redundancy so that only 8 dis- in its interior. We shall call those circumscribed spheres tinct points are candidate for being solution of the max- Delaunay spheres. When a solution x∗ of the problem imization problem maxx∈X mini kx − xik. The solution d maxx∈X mini kx−xik is in the interior of [0, 1] , it must is at the origin and gives φmM (ξ) = mini kxik ' 0.5590. be the center of some Delaunay sphere.

There a slight difficulty when x∗ is on the boundary of X, since the tessellation directly constructed from 3 Model-free design the x does not suffice. However, x is still the center of i ∗ We continue for the moment to consider the situation a Delaunay sphere if we construct the tessellation not when we are not able, or do no want, to make an as- only from the points in ξ but also from their symmetric sumption about a suitable model for the emulator. We with respect to all (d − 1)-dimensional faces of X, see investigate the properties of some geometric and other Appendix A. model-free design criteria more closely and make con- The Delaunay tessellation is thus constructed on a nections between them. set of (2d + 1)n points. (One may notice that X is not necessarily included in the convex hull of these points for d ≥ 3, but this is not an issue.) Once the tessellation 3.1 Lq-regularization of the maximin-distance criterion is calculated, we collect the radii of Delaunay spheres having their center in X (boundary included); the value Following the approach in Appendix B, one can de- of φmM (ξ) is given by the maximum of these radii (see fine regularized forms of the maximin-distance crite- 6

∗ rion, valid when q > 0 for any ξ such that φMm(ξ) > 0: where di = minj6=i dij denotes the nearest-neighbor (NN) distance from xi to another design point in ξ. Fol-  −1/q  −1/q X X lowing the same technique as above, a Lq-regularization φ (ξ) =  d−q , φ (ξ) =  µ d−q , applied to the min function in (5) then gives [q] ij [q] ij ij i 0 for all i and i 0 , with µ = min µ , and the [q] ii dij is that the re- Pn µ ¶1/q −q −1/q n sulting criterion [ i=1(minj>i dij) ] depends on φ (ξ) ≤ φMm(ξ) ≤ φ (ξ) . (2) the ordering of the design points. One may also define [q] 2 [q] φ[NN,0](ξ) as ( " #) It also yields the best lower bound on the maximin ef- n ∗ 1 X ficiency of an optimal design ξ for φ (·), φ (ξ) = exp log(d∗) , (8) [q] [q] [NN,0] n i i=1 φ (ξ∗ ) µ ¶−1/q Mm [q] n see Appendix B. One can readily check that using the ∗ ≥ , (3) φMm(ξMm) 2 generalization (36) with φ(t) = log(t) and q = −1 also gives φ[NN,−1,log](ξ) = φ[NN,0](ξ). Not surprisingly, where ξ∗ denotes any maximin-distance design, see φ (·) gives a better approximation of φ (·) than Mm [NN,q] Mm Appendix B. One may define φ[NN,0](ξ) as φ (ξ): an optimal design ξ∗ for φ (·) satisfies [q] [NN,q] [NN,q]    µ ¶ ∗  −1 X  φMm(ξ ) n   [NN,q] −1/q φ[0](ξ) = exp log(dij) (4) ∗ ≥ n  2  φMm(ξ ) i log(n)/², compare and φ (·) corresponds to a criterion initially proposed [2] with (3). Exploiting the property that, for a given i, by Audze and Eglais (1977). Morris and Mitchell (1995)  −1/q  −1/q use φ (·) with different values of q and make the ob- X X [q]  d−q ≤ d∗ ≤ (n − 1)1/q  d−q , servation that for moderate values of q (say, q - 5) ij i ij j6=i j6=i the criterion is easier to optimize than φMm(·) in the class of Lh designs. They also note that, depending on see (35), we obtain that the problem, one needs to take q in the range 20-50 to −1/q 2 φ (ξ) ≤ φ (ξ) ≤ φMm(ξ) make the two criteria φ (·) and φMm(·) agree about [q] [NN,q] [q] µ ¶ the designs considered best. Their observation is con- n 1/q φ (ξ) ≤ n1/q φ (ξ) ≤ φ (ξ) . sistent with the efficiency bounds given above. Accord- Mm [NN,q] 2 [q] ing to the inequality (3), to ensure that the maximin Note that the upper bounds on φ (·) are sharp (think efficiency of an optimal design for φ (·) is larger than Mm [q] of a design with n = d + 1 points, all at equal distance 1 − ² one should take approximately q > 2 log(n)/² (in- ∗ from each other, i.e., such that dij = di is constant). dependently of the dimension d). Note that the use of P Fig. 4 presents the bounds (2) (dashed lines, top) φ (ξ) = [ d−q]−1/q would worsen the maximin ef- [q] i6=j ij and (6) (dashed lines, bottom) on the value φMm(ξ) −1/q ficiency bounds by a factor 2 < 1 (but leaves φ[q](·) (solid line) for the 7-point maximin-distance design of unchanged when the uniform measure µij = [n(n − Fig. 1–left. Notice the accuracy of the upper bound 1)]−1 is used). n1/q φ (ξ) (note the different scales between the [NN,q] We may alternatively write φMm(ξ) as top and bottom panels); the situation is similar for ∗ ∗ other maximin-distance designs since d = φMm(ξ ) ∗ i Mm φMm(ξ) = min di , (5) i for many i. 7 P 1.4 N p (j) 1/p can be approximated by [ j=1 φq (x )] with p > 0. This gives following substitute for φmM (ξ), 1.2  " # 1/p N n −p/q 1 X X  φ (ξ) = kx(j) − x k−q [p,q]  i  0.8 j=1 i=1

0.6 with p, q > 0, see Royle and Nychka (1998). Note that the xi are usually elements of XN . When X is not dis- (j) 0.4 cretized, the sum over x ∈ XN should be replaced by an integral over X, which makes the evaluation of 0.2 φ[p,q](ξ) rather cumbersome.

0 −10 0 10 20 30 40 50 q

0.7 3.3 From maximin-distance to entropy maximization

0.6 Suppose that the n points xi in ξ form n i.i.d. samples of a probability measure with density ϕ with respect 0.5 to the Lebesgue measure on X. A natural statistical approach to measure of the quality of ξ in terms of its 0.4 space-filling properties is to compare it in some way with samples from the uniform measure on X. Using 0.3 discrepancy is a possibility, see Sect. 2.3. Another one

0.2 relies on the property that the uniform distribution has maximum entropy among all distributions with finite 0.1 support. This is the approach followed in this section. The R´enyi (1961) entropy of a random vector of Rd 0 −10 0 10 20 30 40 50 having the p.d.f. ϕ (that we shall call the R´enyi entropy q of ϕ) is defined by Fig. 4 Upper and lower bounds (dashed lines) on the value Z φ (ξ) for the 7-point maximin-distance design of Fig. 1–left: Mm ∗ 1 α (2) on the top, (6) on the bottom; the value of φMm(ξ) is indi- Hα(ϕ) = log ϕ (x) dx , α 6= 1 . (9) 1 − α Rd cated by a solid line, φ[0](ξ) (4) and φ[NN,0](ξ) (8) are in dotted lines, respectively on the top and bottom panels. The Havrda-Charv´at(1967) entropy (also called Tsallis (1988) entropy) of ϕ is defined by ∗ µ Z ¶ Olerq (1961) indicates that for d = 2 φMm(ξMm) ≤ √ 1 α [1 + 1 + 2 (n − 1) / 3]/(n − 1). The equivalence with Hα(ϕ) = 1 − ϕ (x) dx , α 6= 1 . (10) α − 1 Rd sphere-packing gives φ (ξ∗ ) < [(n V )1/d/2 − 1]−1 Mm Mm d When α tends to 1, both H and H∗ tend to the (Boltz- with V the volume of the d-dimensional unit ball; this α α d mann-Gibbs-) Shannon entropy bound becomes quite loose for large d and can be im- Z proved by using results on packing densities of dens- H1(ϕ) = − ϕ(x) log[ϕ(x)] dx . (11) est known packings (which may be irregularp for some Rd d > 3), yielding φ (ξ∗ ) ≤ (31/4 n/2 − 1)−1 for Mm Mm √ ∗ d = 2 and φ (ξ∗ ) ≤ [(n/ 2)1/3 − 1)−1 for d = 3. Note that Hα = log[1 − (α − 1)Hα]/(1 − α) so that, for Mm Mm ∗ ∗ Bounds for maximin Lh designs in dimension d can be any α, d(Hα)/d(Hα) > 0 and the maximizations of Hα found in (van Dam et al, 2009). and Hα are equivalent; we can thus speak indifferently of α-entropy maximizing distributions. The entropy Hα is a concave function of the den- 3.2 Lq-regularization of the minimax-distance criterion sity ϕ for α > 0 (and convex for α < 0). Hence, α- entropy maximizing distributions, under some specific The same type of relaxation can be applied to the cri- constraints, are uniquely defined for α > 0. In particu- terion φmM (ξ). First, φ(x) = minxi kx − xik is approx- lar, the α-entropy maximizing distribution is uniform P −q −1/q imated by φq(x) = ( i kx − xik ) with q > 0. under the constraint that the distribution is finitely Second, when X is discretized into a finite grid XN = supported. The idea, suggested by Franco (2008), is (1) (N) −1 −1 {x ,..., x }, maxx∈XN φq(x) = [minx∈XN φq (x)] thus to construct an estimator of the entropy of the 8 design points xi in ξ, considering them as if indepen- N (0, I), then dently drawn with some probability distribution, and Z · ¸ 1 X kx − x k2 use this entropy estimator as a design criterion to be ϕˆ2 (x) dx = exp − i j ; n d d/2 2 d 2 d 2 π n h 4h maximized. Note that this use of entropy (for a dis- R n i,j n tribution in the space of input factors) is not directly that is, a Monte-Carlo evaluation gives the exact value connected Maximum-Entropy Sampling of Sect. 4.3 (for of the integral in (9, 10) for ϕ =ϕ ˆ when α = 2. a distribution in the space of responses). n This is exploited in (Bettinger et al, 2008, 2009) for Many methods exist for the estimation of the en- the sequential construction of an experiment with the tropy of a distribution from i.i.d. samples, and one may objective of inverting an unknown system. refer for instance to the survey papers (Hall and Mor- ton, 1993; Beirlant et al, 1997) for an overview. We Nearest-neighbor (NN) distances The following estima- shall consider three, because they have either already tor of H (ϕ) is considered in (Leonenko et al, 2008) been used in the context of experimental design or are α 1−α P directly connected with other space-filling criteria. In [(n−1) Ck Vd] n ∗ d(1−α) 1 − (dk,i) a fourth paragraph, entropy decomposition is used to Hˆ = n i=1 (14) n,k,α α − 1 avoid the collapsing of design points when considering d/2 lower dimensional subspaces. where Vd = π /Γ (d/2 + 1) is the volume of the unit d 1/(1−α) ball B(0, 1) in R , Ck = [Γ (k)/Γ (k + 1 − α)] ∗ Plug-in method based on kernel density estimation The and dk,i is the k-th nearest-neighbor distance from xi approach is in two steps. First, one construct an esti- to some other xj in the sample (that is, from the n − 1 ∗ mator of the p.d.f. ϕ by a kernel method as distances dij , j 6= i, we form the order statistics d1,i = ∗ ∗ ∗ µ ¶ di ≤ d2,i ≤ · · · ≤ dn−1,i). The L2-consistency of this es- 1 Xn x − x ϕˆ (x) = K i , (12) timator is proved in (Leonenko et al, 2008) for any α ∈ n n hd h n i=1 n (1, (k+1)/2) when k ≥ 2 (respectively α ∈ (1, 1+1/[2d]) when k = 1) if f is bounded. For α < 1, one may refer to where K(·) denotes the kernel and hn the window width. (Penrose and Yukich, 2011) for the a.s. and L2 conver- The choices of K(·) and hn are important issues when gence of Hˆn,k,α to Hα(ϕ); see also the results of Yukich the objective is to obtain an accurate estimation of ϕ (1998) on the subadditivity of Euclidean functionals. and there exists a vast literature on that topic. How- For α = 1 (Shannon entropy), the following estima- ever, this should not be too critical here since we only tor is considered in (Kozachenko and Leonenko, 1987; need to get an entropy estimator that yields a reason- Leonenko et al, 2008) able space-filling criterion. A common practice in den- sity estimation is to take h decreasing with n, e.g. as XN n d ∗ −1/(d+4) Hˆ = log d + log(n − 1) + log(V ) − Ψ(k) , n , see Scott (1992, p. 152), and to use a p.d.f. N,k,1 n k,i d for K(·), e.g. that of the standard normal distribution i=1 in Rd. A kernel with bounded support could be more where Ψ(z) = Γ 0(z)/Γ (z) is the digamma function. indicated since X is bounded, but the choice of the win- Maximizing Hˆn,1,α for α > 1 thus corresponds to dow width might then gain importance. When a kernel- maximizing φ (ξ) with q = d(α − 1), see (7). For [NN,q] based prediction method is to be used, it seems natural 1 − 1/d ≤ α ≤ 1, the criterion Hˆn,1,α, is still eligible to relate K(·) and hn to the kernel used for prediction for space-filling, its maximization is equivalent to that (to the correlation function in the case of kriging); this of φ (ξ) with q ∈ [−1, 0]; for instance, the maxi- will be considered in Sect. 4.3. [NN,q] ∗ mization of HˆN,1,1 is equivalent to the maximization of In a second step, the entropy Hα or Hα is estimated φ (ξ), see (8). by replacing the unknown ϕ by the estimateϕ ˆn in the [NN,0] definition. In order to avoid the evaluation of multi- Several comments should be made, however, that dimensional integrals, a Monte-Carlo estimator can be will temper the feeling that Lq-regularization of maxi- P ˆ n n min-distance design and maximization of NN-estimates used, namely H1 = − i=1 log[ϕ ˆn(xi)] for Shannon entropy, and of entropy are equivalent. " # First, these estimators rely on the assumption that n 1 X the xi are i.i.d. with some p.d.f. ϕ. However, optimiz- Hˆ n = 1 − ϕˆα−1(x ) (13) α α − 1 n i ing the locations of points with respect to some de- i=1 sign criterion makes the corresponding sample com- for Hα with α 6= 1. A surprising result about normal pletely atypical. The associated value of the estima- densities is that when K(·) is the p.d.f. of the normal tor is therefore atypical too. Consider for instance the 9

∗ maximin-distance design ξMm on [0, 1], defined by xi = The MST constructed from the xi has already been (i − 1)/(n − 1), i = 1, . . . , n. Direct calculation gives advocated as a useful tool to assess the quality of de- ˆ ∗ 1−α Hn,1,α(ξMm) = [1 − 2 /Γ (2 − α)]/(α − 1), which signs in terms of their space-filling properties: in (Franco is greater than 1 for 0 < α < 2, with a maximum et al, 2009), the empirical mean and variance of the γ + log(2) ' 1.2704 when α tends to 1. On the other lengthes of edges di of the MST are used to character- hand, the maximum value of H(ϕ) for ϕ a p.d.f. on ize classes of designs (such as random, low discrepancy [0, 1] is obtained for the uniform distribution ϕ∗(x) = 1 sequences, maximin-distance and minimax-distance de- for all x, with H(ϕ∗) = 0. signs); designs with large empirical means are consid- Second, even if the design points in ξ are generated ered preferable. With the same precautions as above for randomly, using k-th NN distances with k > 1 does not NN entropy estimation, the maximization of the func- Pn−1 −q −1/q make much sense in terms of measuring the space-filling tion ( i=1 di ) in the MST constructed from the performance. Indeed, when using Hˆn,k,α with k > 1, a xi is related to the maximization of an entropy estima- design obtained by fusing sets of k points will show a tor of the distribution of the xi; in particular, the max- higher entropy than a design with all points separated. imization of the empirical mean of the edge lengthes This is illustrated by the simple example of a maximin- (q = −1) forms a reasonable objective. ∗ distance design on the real line. For the design ξMm with n points we have Entropy decomposition to avoid collapsing on projec- h i 1−α 1−α tions Let u and v be two independent random vectors 1 − 2 1 + 2(2 −1) ˆ ∗ Γ (3−α) n respectively in Rd1 and Rd2 . Define x = (u>, v>)> and Hn,2,α(ξMm) = . α − 1 let ϕ(u, v) denote the joint density for x. Let ϕ1(u) and ϕ (v) be the marginal densities for u and v respec- Suppose that n = 2m and consider the design ξ˜∗ 2 Mm tively, so that ϕ(u, v) = ϕ (u)ϕ (v). It is well known obtained by duplicating the maximin-distance design 1 2 that the Shannon and R´enyi entropies (11) and (9) sat- with m points; that is, x = (i−1)/(m−1), i = 1, . . . , m, i isfy the additive property H∗(ϕ) = H∗(ϕ ) + H∗(ϕ ), and x = (i − m − 1)/(m − 1), i = m + 1,..., 2m. We α α 1 α 2 i α ∈ R (extensivity property of Shannon and R´enyi get entropies) while for the Tsallis entropy (10) one has h i 1−α 1−α H (ϕ) = H (ϕ ) + H (ϕ ) + (1 − α)H (ϕ )H (ϕ ) 1 − 2 2 + 1 α α 1 α 2 α 1 α 2 ˜∗ Γ (3−α) m−1 (non-extensivity Tsallis entropy, with α the parameter Hˆn,2,α(ξ ) = Mm α − 1 of non-extensivity). Now, when ϕ is the p.d.f. of the uniform distribu- and Hˆ (ξ˜∗ ) > Hˆ (ξ∗ ) for α ∈ (0, 3). We n,2,α Mm n,2,α Mm tion on the unit cube X = [0, 1]d, one can consider should thus restrict our attention to Hˆn,k,α with k = 1. all one-dimensional projections {x}i, i = 1, . . . , d, and The range of values of α for which the strong consis- Pd H∗(ϕ) = H∗(ϕ ) with ϕ the density of the i-th tency of the estimator is ensured is then restricted to α i=1 α i i projection {x} . This can be used to combine a cri- α < 1 + 1/[2d]. Strictly speaking, it means that the i terion related to space-filling in X with criteria related maximization of φ (ξ) can be considered as the [NN,q] to space-filling along one-dimensional projections. Con- maximization of a NN entropy estimator for q < 1/2 ∗ sider for instance the NN estimator of Hα(ϕ) of Leo- only. nenko et al (2008) (for α 6= 0), n o 1−α P Minimum-spanning-tree Redmond and Yukich (1996); [(n−1) Ck Vd] n ∗ d(1−α) log n i=1 (dk,i) Yukich (1998) use the subadditivity of some Euclidean Hˆ ∗ = . functionals on graphs to construct strongly consistent n,k,α 1 − α ∗ estimators of Hα(ϕ) (9) for 0 < q < 1, up to some (15) bias term independent of ϕ and related to the graph properties. Their approach covers the case of the graph For k = 1 (k > 1 does not fit with the space-filling of k-th NN (where the bias constant depends on the requirement, see the discussion above), we have value of k through Ck, see (14)), but also the graphs " # n corresponding to the solution of a travelling salesman 1 X Hˆ ∗ = log (d∗)d(1−α) + A(α, d, n) , problem, or the minimum spanning tree (MST). In each n,1,α 1 − α i i=1 PM d(1−α) case, the entropy estimate is based on i=1 di , where the di denote the lengthes of the M edges of the where A(α, d, n) is a constant that does not depend on graph, with M = n − 1 for the MST and M = n for the ξ. A suitable criterion (to be maximized) that simulta- traveling-salesman tour and NN graphs. neously takes into account the space-filling objectives 10 in X and along all one-dimensional projections is thus where β is an unknown vector of parameters in Rp and ( " # the random term Z(x) has zero mean, (unknown) vari- 1 Xn ∗ d(1−α) ance σ2 and a parameterized spatial error correlation (1 − γ) log (di ) Z 1 − α 2 i=1 structure such that E{Z(u)Z(v)} = σZ C(u, v; ν). It is " # Xd Xn  often assumed that the deterministic term has a linear +γ log (d ∗)(1−α) structure, that is, η(x, β) = r>(x)β, and that the ran- ji  j=1 i=1 dom field Z(x) is Gaussian, allowing the estimation of β, σ and ν by Maximum Likelihood. This setup is used with γ ∈ (0, 1) and d ∗ = min |{x } − {x } |, or Z ji k6=i i j k j in such diverse areas of spatial data analysis (see Cressie equivalently, setting q = d(α − 1), (1993)) as mining, hydrogeology, natural resource mon- φ (ξ) = (1 − γ) log[φ (ξ)] itoring and environmental science, etc., and has become q,1P [NN,q] the standard modeling paradigm in computer simula- γ Xd + log[φ (ξ)] tion experiments, following the seminal paper of Sacks d [NN,q/d,j] j=1 et al (1989). Here, limv→u C(u, v; ν) = C(u, u; ν) = 1 for all u ∈ X. where φ (ξ) is given by (7) and φ (ξ) = [NN,q] [NN,q,j] Denote by Yˆ (x|ξ) the Best Linear Unbiased Predic- Pn ∗ −q −1/q [ i=1(dji ) ] . Letting q tend to infinity, we get tor (BLUP) of Y (x) based on the design points in ξ and > the following compromise between maximin-distance de- associated observations y(ξ) = [Y (x1), ··· ,Y (xn)] . signs on X and on its one-dimensional projections Optimal design in this context is usually performed by minimizing a functional of var[Yˆ (x|ξ)] = E[(Yˆ (x|ξ) − d γ X 2 φ (ξ) = (1 − γ) log[φ (ξ)] + log[φ (ξ)] , Y (x)) ] at x, the unconditional Mean-Squared Predic- ∞,1P Mm d Mmj j=1 tion Error (MSPE), also called the kriging variance. Keeping ν fixed, then in the linear setting (universal ∗ > with φMmj(ξ) = mini dji = mink6=i |{xi}j − {xk}j|. kriging, with η(x, β) = r (x)β, generally a polynomial One should note that there exists a threshold γ∗ = in x), the BLUP takes the form γ∗(d, n) such that the optimal design associated with ˆ > ˆ > −1 ˆ any γ ≥ γ∗ is a maximin Lh design. Y (x|ξ) = r (x)β + cν (x)Cν [y(ξ) − Rβ] , (17) When α = 1 (Shannon entropy), identical develop- where {c (x)} = C(x, x ; ν), {C } = C(x , x ; ν), ments lead to the same criterion φ (ξ) as above with ν i i ν ij i j q,1P i, j = 1, . . . , n, and βˆ = βˆ is the weighted least-squares q set to zero, φ (ξ) defined by (8) and φ (ξ) = ν [NN,0] [NN,0,j] estimator of β in the linear regression model, that is, Pn ∗ exp{[ i=1 log(dji )]/n}. ˆ > −1 −1 > −1 Other combinations of criteria are possible; one may βν = [R Cν R] R Cν y(ξ) , for instance maximize a space-filling criterion in X un- > ˆ der constraints on the space-filling properties along one- with R = [r(x1),..., r(xn)] . Notice that Y (x|ξ) does ˆ dimensional projections. Also, projections on higher di- not depend on σZ and that Y (xi|ξ) = Y (xi) for all i mensional subspaces can be taken into account in a (the predictor is a perfect interpolator). We can write similar way using the appropriate decomposition of the Yˆ (x|ξ) = v>(x)y(ξ) entropy of joint densities. ν where −1 > −1 −1 > −1 4 Model-based design: the case of kriging vν (x) = Cν [In − R(R Cν R) R Cν ]cν (x) −1 > −1 −1 +Cν R(R Cν R) r(x) (18) In the following we assume that we have a reasonable with I the n-dimensional identity matrix. The MSPE simplified model (the so called emulator) for the un- n is given by known function f(·), whose evaluation at a given point 2 2 © > −1 x relies on a computer code (evaluations at the design MSPEξ(x, σ , ν) = σ 1 − c (x)C cν (x) Z Z ν ν ª points in ξ form a computer experiment). > > −1 −1 +gν (x)[R Cν R] gν (x) > −1 with gν (x) = r(x)−R Cν cν (x). Note that the MSPE 2 2 4.1 Gaussian-process model and kriging depends on (σZ , ν), with σZ intervening only as a mul- 2 2 tiplicative factor. We shall denote by ρ (x) = ρξ(x, ν) In particular, consider the following spatial random field the normalized kriging variance,

2 2 2 Y (x) = f(x) = η(x, β) + Z(x), (16) ρξ(x, ν) = MSPEξ(x, σZ , ν)/σZ (19) 11 and omit the dependence in ξ and ν when it does not the (normalized) kriging variance when β is known. The 2 lead to ambiguities. Note that ρξ(xi, ν) = 0 for all i. objective of this section is to construct upper bounds 2 2 We suppose for the moment that ν is known (the on maxx∈X ρ0(x) and maxx∈X ρ (x), see (20). investigation of the (more realistic) situation where ν From the developments given in Appendix C, we is unknown is postponed to Sect. 5) and omit the de- obtain the bound pendence on ν in the notations. It is sufficient in many 2 2 CmM circumstances to take η(x, β) = β, that is, to model ρ0(x) ≤ 1 − , the unknown function as the realization of a stochas- λmax(C) tic process with unknown mean value. In that case, the for the case where β is known and, for a weak enough normalized kriging variance is simply correlation, the approximate bound [1 − c>(x)C−11]2 2 > −1 C2 (1 − C u)2 ρ (x) = 1 − c (x)C c(x) + > −1 , (20) 2 mM mM 1 C 1 ρ (x) ≤ 1 − + > −1 λmax(C) 1 C 1 with 1 the n-dimensional vector of ones. −1 A natural approach for designing an experiment is where u = mini{C 1}i when β is unknown. to choose ξ that minimizes a functional of the krig- Using further approximations, one can obtain bounds ing variance, for instance its integrated value φ (ξ) = R A that depend on CmM and CMm but not on C, see Ap- 2 X ρ (x) dx , (generally evaluated by a discrete sum over pendix C. We obtain a finite grid) or the G-optimality criterion (by analogy 2 with G-optimal design for regression models, see Kiefer 2 2 c¯(x) ρ0(x) ≤ ρ¯0(x) = 1 − , (22) and Wolfowitz (1960)) 1 + (n − 1)CMm 2 φG(ξ) = max ρ (x) . (21) wherec ¯(x) = maxi{c(x)}i, and thus x∈X 2 Johnson et al (1990) show that a minimax-optimal de- 2 2 CmM max ρ0(x) ≤ ρ¯0 = 1 − . (23) sign is asymptotically G-optimal when the correlation x∈X 1 + (n − 1)CMm function has the form Ck(·) with k tending to infinity (i.e., it tends to be G-optimal for weak correlations). See Also, when the correlation is weak enough, also Joseph (2006) who motivates the use of minimax- 1 + (n − 1)CMm optimal designs for his limit-kriging approach. The eval- ρ2(x) ≤ ρ¯2(x) =ρ ¯2(x) + R2(x) (24) 0 n uation of φG(ξ) at any given ξ requires the solution of 2 a maximization problem over X, which makes the opti- withρ ¯0(x) given by (22) and mization of φ (·) a rather exhausting task. Replacing G · ¸2 the optimization over X by a grid search over a finite 2 1 − (n − 1)CMm R (x) = 1 − c¯(x) 2 , subset XN ⊂ X is often used; another option is to per- 1 − (n − 1)CMm form a Delaunay tessellation of the points in ξ plus the which gives vertices of X = [0, 1]d and initialize a local search for · ¸ the maximum of ρ2(x) at the center of each Delaunay 1 − (n − 1)C 2 max ρ2(x) ≤ ρ¯2 =ρ ¯2 + 1 − C Mm simplex (see Sect. 2.4). A third option, considered be- 0 mM 2 x∈X 1 − (n − 1)CMm low, consists in using an upper bound on φG(ξ). 1 + (n − 1)C × Mm (25) n 2 4.2 Upper bounds on the kriging variance withρ ¯0 given by (23). More accurate bounds are given in (Griffith, 2003) when the points in ξ follow a regu- We only consider isotropic processes, with correlation lar pattern. Similar ideas could be applied to the limit depending on the Euclidean distance between points, kriging predictor of Joseph (2006). 2 i.e. satisfying E{Z(u)Z(v)} = σZ C(ku−vk; ν), (u, v) ∈ X2. The extension to the non-isotropic case should not Example 1 We consider a two-dimensional example with raise major difficulties through an appropriate change four design points, three at the corners (1, 0), (1, 1), of metric in X. We suppose that the radial correlation (0, 1) and one in the center (1/2, 1/2) of X = [0, 1]2. Pre- function C(·; ν) is non-increasing and non-negative on diction is considered along the diagonal going from the + > > R . Denote CMm = C(φMm), CmM = C(φmM ) (we origin to the corner (1, 1), with x = (0 , 0) +γ(1 , 1) , omit the dependence in ξ where there is no ambiguity) γ ∈ [0, 1]. The correlation function is C(t) = (1−t)4 (1+ and 4t) with C(t) = 0 for t ≥ 1, see Wendland (2005).√ No- 2 > −1 ρ0(x) = 1 − c (x)C c(x) tice that C has the form (41) with CMm = C( 2/2) ' 12

1.4 1.2

1.2 1

1 0.8

0.8 (x) (x) 0.6 2 2 ρ ρ 0.6

0.4 0.4

0.2 0.2

0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 γ x Fig. 5 Kriging variance (normalized) and bounds with, in solid Fig. 6 Kriging variance (normalized) and bounds for the 5-point lines from top to bottom,ρ ¯2(x) given by (24) and the exact minimax-optimal design with, in solid lines from top to bottom, 2 2 (normalized) kriging variance ρ (x); the values of ρ0(x) and ρ¯(x) given by (24) and the exact (normalized) kriging variance 2 2 2 of its upper boundρ ¯0(x) (22) are indicated in dotted lines; ρ (x); the values of ρ0(x) and of its upper boundρ ¯0(x) (22) are x = (0 , 0)> + γ(1 , 1)>. indicated in dotted lines.

0.0282. Fig. 5 presents the (normalized) kriging vari- ances ρ2(x) and ρ2(x) together with the bounds con- 0 1.2 2 2 structed above. We have ρ (x) = ρ0(x) = 0 at the de- sign points (1/2, 1/2) and (1, 1). Note that the bounds 2 2 1 ρ¯ (x) andρ ¯0(x) although not tight everywhere (in par- ticular, they are pessimistic at the design points) give a reasonable approximation of the behavior of ρ2(x) and 0.8 2 ρ0(x) respectively. Also note that the global bounds

(x) 0.6

(23) and (25) (reached at x = (0, 0)) are rather tight. 2 ρ

Example 2 We consider a one-dimensional example with 0.4 ∗ the 5-point minimax-optimal design ξmM = (0.1, 0.3, 0.5, 0.7, 0.9) in X = [0, 1] for the correlation C(t) = 0.2 exp(−10 t). Fig. 6 presents the (normalized) kriging vari- 2 2 ances ρ0(x) and ρ (x) together with the bounds con- 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 structed above as x varies in X. The boundsρ ¯0(x) given x by (22) andρ ¯(x) given by (24) are nowhere tight (nei- Fig. 7 Kriging variance (normalized) and bounds for the 5-point 2 ther are the global boundsρ ¯ andρ ¯0 given by (23) and maximin-optimal design with, in solid lines from top to bottom, (25)), but the behavior of the kriging variance as a func- ρ¯(x) given by (24) and the exact (normalized) kriging variance 2 2 tion of x is satisfactorily reproduced. Fig. 7 presents the ρ (x); the values of ρ0(x) and of its upper boundρ ¯0(x) (22) are indicated in dotted lines. same information for the 5-point maximin-optimal de- ∗ sign ξMm = (0, 0.25, 0.5, 0.75, 1). For a small enough correlation, a minimax-optimal design ensures a smaller value for max ρ2(x) than a x∈X ∗ maximin-optimal design, see Johnson et al (1990). One 2α)/2, α + 3(1 − 2α)/4, 1 − α), which includes ξMm (for ∗ might hope that this tendency will also be observed α = 0) and ξmM (for α = 0.1). The correlation function 2 when using the upper boundρ ¯2 given by (25). This is C(t) = exp(−ν t). Fig. 8 presents maxx∈X ρ (x) and 2 seems to be the case, as the following continuation of ρ¯ given by (25) as functions of α in the strong (left, Example 2 illustrates. ν = 7) and weak (right, ν = 40) correlation cases. Al- though the curves do not reach their minimum value for Example 2 (continued) We consider the following fam- the same α, they indicate the same preference between ∗ ∗ ily of 5-point designs: ξ(α) = (α, α+(1−2α)/4, α+(1− ξMm and ξmM . 13

2 1.204 of the distribution of the xi, see (13) and (12). When ˆ n 1.202 α = 2, the maximization of Hα is equivalent to the 1.8 minimization of 1.2 µ ¶ X x − x 1.6 φ(ξ) = K i j . 1.198 h i,j n 1.4 1.196

2 2 A natural choice in the case of prediction by kriging ρ ρ 1.194 1.2 Pis K[(u − v)/hn] = C(ku − vk), which yields φ(ξ) = {C} . Since C has all its diagonal elements equal 1.192 i,j ij 1 to 1, its determinant is maximum when the off-diagonal 1.19 elements are zero, that is when φ(ξ) = n. Also note that 0.8 > 1.188 φ(ξ) 1 C1 1 − (n − 1)C ≤ λ (C) ≤ = Mm min n n 1.186 0 0.05 0.1 0.15 0.2 0 0.05 0.1 0.15 0.2 α (ν=7) α (ν=40) ≤ λmax(C) ≤ 1 + (n − 1)CMm .

2 2 Fig. 8 maxx∈X ρ (x) (solid line) andρ ¯ (25) (dashed line) as The upper bound on λmax(C) is derived in Appendix functions of α for the design ξ(α) when ν = 7 (left) and ν = 40 C.√ The lower bound is obtained from λmin(C) ≥ t − (right); the maximin-optimal design corresponds to α = 0, the s n − 1 with t = tr(C)/n and s2 = tr(C2)/n − t2, see minimax-optimal design to α = 0.1 (dotted vertical line). Wolkowicz and Styan (1980). Since {C}ij = {C}ji ≤ 2 CMm for all i 6= j, we get tr(C) = n and tr(C ) ≤ 2 4.3 Maximum-Entropy Sampling n[1 + (n − 1)CMm] which gives the lower bound above. Note that bounds on λmin(C) have been derived in Suppose that X is discretized into the finite set XN the framework of interpolation with radial basis func- with N points. Consider yN , the vector formed by Y (x) tions, see Narcowich (1991); Ball (1992); Sun (1992) for for x ∈ XN , and y(ξ), the vector obtained for x ∈ ξ. lower bounds and Schaback (1994) for upper bounds. For any random y with p.d.f. ϕ(·) denote ent(y) the A maximin-distance design minimizes CMm and thus (Shannon) entropy of ϕ, see (11). Then, from a classical minimizes the upper bound above on φ(ξ). theorem in information theory (see, e.g., Ash (1965, p. 239)), 5 Design for estimating covariance parameters ent(yN ) = ent(yξ) + ent(yN\ξ|yξ) (26) We now consider the case where the covariance C used where yN\ξ denotes the vector formed by Y (x) for x ∈ for kriging (Sect. 4.1) depends upon unknown parame- XN \ ξ and ent(y|w), the conditional entropy, is the ters ν that need to be estimated (by Maximum Likeli- expectation with respect to w of the entropy of the hood) from the dataset y(ξ). conditional p.d.f. ϕ(y|w), that is, Z µZ ¶ ent(y|w) = − ϕ(w) ϕ(y|w) log[ϕ(y|w)] dy dw . 5.1 The Fisher Information matrix

Under this assumption, a first step towards good pre- The argumentation in (Shewry and Wynn, 1987) is as diction of the spatial random field may be the pre- follows: since ent(y ) in (26) is fixed, the natural ob- N cise estimation of both sets of parameters β and ν. jective of minimizing ent(y |y ) can be fulfilled by N\ξ ξ The information on them is contained in the so-called maximizing ent(y ). When Z(x) in (16) is Gaussian, ξ Fisher information matrix, which can be derived ex- ent(y ) = (1/2) log det(C) + (n/2)[1 + log(2π)], and ξ plicitly when the process Z(·) is Gaussian. In this case Maximum-Entropy-Sampling corresponds to maximiz- the (un-normalized) information matrix for β and θ = ing det(C), which is called D-optimal design (by anal- (σ2 , ν>)> is block diagonal. Denoting C = σ2 C , we ogy with optimum design in a parametric setting). One Z θ Z ν get can refer to (Wynn, 2004) for further developments. µ ¶ Johnson et al (1990) show that a maximin-optimal Mβ(ξ; θ) O Mβ,θ(ξ; β, θ) = , (27) design is asymptotically D-optimal when the correla- OMθ(ξ; θ) tion function has the form Ck(·) with k tending to infin- > ity (i.e., it tends to be D-optimal for weak correlations). where, for the model (16) with η(x, β) = r (x)β, We have considered in Sect. 3.3 the design criterion (to 1 M (ξ; θ) = R>C−1R be maximized) given by a plug-in kernel estimator Hˆ n β 2 α σZ 14

> with R = [r(x1),..., r(xn)] and 5.2 The modified kriging variance ½ ¾ 1 ∂C ∂C {M (ξ; θ)} = tr C−1 θ C−1 θ . G-optimal designs based on the (normalized) kriging θ ij 2 θ ∂θ θ ∂θ i j variance (19) are space filling (see, e.g., van Groeni- ˆ 2 gen (2000)); however, they do not reflect the resulting Since Y (x|ξ) does not depend on σZ and σZ only in- tervenes as a multiplicative factor in the MSPE, see additional uncertainty due to the estimation of the co- Sect. 4.1, we are only interested in the precision of the variance parameters. We thus require an updated de- estimation of β and ν. Note that sign criterion that takes that uncertainty into account. µ ¶ Even if this effect is asymptotically negligible, see Put- 4 > n/(2σZ ) zν (ξ; θ) Mθ(ξ; θ) = ter and Young (2001), its impact in finite samples may zν (ξ; θ) Mν (ξ; ν) be decisive, see M¨ulleret al (2010). with Various proposals have been made to correct the µ ¶ 1 ∂C kriging variance for the additional uncertainty due to {z (ξ; θ)} = tr C−1 ν ν i 2σ2 ν ∂ν the estimation of ν. One approach, based on Monte- Z ½ i ¾ 1 ∂C ∂C Carlo sampling from the asymptotic distribution of the −1 ν −1 ν n {Mν (ξ; ν)}ij = tr C C . estimated parametersν ˆ , is proposed in (Nagy et al, 2 ν ∂ν ν ∂ν i j 2007). Similarly, Sj¨ostedt-De-Lunaand Young (2003) Denote µ ¶ and den Hertog et al (2006) have employed bootstrap- > −1 a(ξ; θ) bν (ξ; θ) ping techniques for assessing the effect. Harville and Mθ (ξ; θ) = . bν (ξ; θ) Aν (ξ; ν) Jeske (1992) use a first-order expansion of the krig- ing variance forν ˆn around its true value, see also Abt The block of A (ξ; ν) then characterizes the precision ν (1999) for more precise developments and Zimmerman of the estimation of ν (note that A (ξ; ν) = [M (ξ; ν)− ν ν and Cressie (1992) for a discussion and examples. This 2σ4 z (ξ; θ)z>(ξ; θ)/n]−1 does not depend on σ ). The Z ν ν Z has the advantage that we can obtain an explicit correc- matrix A (ξ; ν) is often replaced by M−1(ξ; ν) and ν ν tion term to augment the (normalized) kriging variance, Mβ,θ(ξ; β, θ) by µ ¶ which gives the approximation Mβ(ξ; θ) O 2 2 Mβ,ν (ξ; β, θ) = , ρ˜ξ(x, ν) = ρξ(x, ν) OMν (ξ; ν) ½ ¾ ∂v>(x) ∂v (x) which corresponds to the case when σ is known. This +tr M−1(ξ; ν) ν C (ν) ν , (29) Z ν ∂ν ν ∂ν> can sometimes be justified from estimability consider- 2 ations concerning the random-field parameters σZ and with vν (x) given by (18) (note thatρ ˜ξ(xi, ν) = 0 for all ν. Indeed, under the infill design framework (i.e., when i). Consequently, Zimmerman (2006) constructs designs the design space is compact) typically not all parame- by minimizing ters are estimable, only some of them, or suitable func- ˜ 2 tions of them, being micro-ergodic, see e.g. Stein (1999); φG(ξ) = max ρ˜ξ(x, ν) (30) x∈X Zhang and Zimmerman (2005). In that case, a reparame- trization can be used, see e.g. Zhu and Zhang (2006), for some nominal ν, which he terms EK-(empirical kri- and one may sometimes set σZ to an arbitrary value. ging-)optimality (see also Zhu and Stein (2005) for a When both σZ and ν are estimable, there is usually no similar criterion). The objective here is to take the −1 big difference between Aν (ξ; ν) and Mν (ξ; ν). dual effect of the design into account (obtaining ac- Following traditional optimal design theory, see, e.g., curate predictions at unsampled sites and improving Fedorov (1972), it is common to choose designs that the accuracy of the estimation of the covariance pa- maximize a scalar function of Mβ,ν (ξ; β, θ), such as its rameters, those two objectives being generally conflict- determinant (D-optimality). M¨ullerand Stehl´ık(2010) ing) through the formulation of a single criterion. One 2 have suggested to maximize a compound criterion with should notice thatρ ˜ξ(x, ν) may seriously overestimate weighing factor α, the MSPE at x when the correlation is excessively weak. Indeed, for very weak correlation the BLUP (17) ap- Φ [ξ|α] = (det[M (ξ; θ)])α (det[M (ξ; ν)])1−α . (28) D β ν proximately equals r>(x)βˆ excepted in the neighbor- Some theoretical results for special situations showing hood of the xi due to the interpolating property Yˆ (xi|ξ) that α → 1 leads to space-filling have been recently = Y (xi) for all i; v(x) then shows rapide variations in ˇ given in (Kisel´akand Stehl´ık, 2008), (Zagoraiou and the neighborhood of the xi and k∂v(x)/∂νk may be- Antognini, 2009) and (Dette et al, 2008); Irvine et al come very large. In that case, one may add a nugget ef- (2007) motivate the use of designs with clusters of points. fect to the model and replace (16) by Y (x) = η(x, β) + 15

1 1 Z(x) + ε(x) where the ε(xi) are i.i.d. errors, also inde- 0.9 0.9 pendent from the random process Z(x), with zero mean 0.8 0.8 2 and constant variance σε . The BLUP then no longer 0.7 0.7 interpolates the data which renders v(x) more stable; 0.6 0.6 see e.g. Gramacy and Lee (2010) for other motivations 0.5 0.5 concerning the introduction of a nugget effect. 0.4 0.4 The minimization of (30) is a difficult task, even for 0.3 0.3 0.2 0.2 moderate d, due to the required maximizations ofρ ˜2(x). 0.1 0.1

Similarly to Sect. 4.2, the derivation of an upper bound 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 onρ ˜ (x) could be used to form a simpler criterion. ˜ Fig. 9 Contour plot for φG(ξ) on the 7-point minimax and max- (Notice that in the case considered in Sect. 4.2 where imin Lh-design of Fig. 2 (left) and with the central point shifted −1 (right). η(x, β) = β we have vν (x) = Cν Pν cν (x)+wν with Pν > −1 > −1 the projector Pν = In − 11 Cν /(1 Cν 1) and wν = −1 > −1 Cν 1/(1 Cν 1) not depending on x.) An alternative ˜ ∗ ˜ ∗ – 4) If φG(ξn0,n1 ) ≥ φG(ξn0+1,n1−1), select the design approach that also takes the effect of the unknown ν on ∗ ξ , stop; otherwise, increment n1 by 1 and predictions into account would be to place Maximum- n0+1,n1−1 return to step 3). Entropy-Sampling of Sect. 4.3 into a Bayesian frame- work, setting a prior distribution on β and ν. Also, the Asymptotically, EK-optimal designs approach the typi- relationship between the criteria (28) and (30) is ex- cally space-filling G-optimal designs since the correcting plored by M¨ulleret al (2010) who show that, although term in (29) vanishes, see Putter and Young (2001). For a complete equivalence can not be reached by a mere n large and φ0(·) in good agreement with G-optimality selection of α in general, respective efficiencies are usu- (e.g., φ0 = φmM ) we can thus expect the value of n1 ∗ ally quite high. in ξn0+1,n1−1 selected by the strategy above to be rela- The strategy proposed in the next section tries to tively small. combine space-filling model-free design, to be used in a first stage, with estimation-oriented design, based on a Example 3 That step 3) of the above procedure makes model, to be used in a second stage. The objective is to sense can be demonstrated on a simple example. Take reach good performance in terms of the modified kriging the 7 point Lh design from Fig. 2, which is simultane- variance (29) but keep the computational burden as low ously minimax and maximin optimal. Setting ν = 7, we ˜ as possible. get φG(ξ) ' 1.913, obtained for x ' (0.049, 0.951) or −3 (0.951, 0.049), see Fig. 9-left, with Mν (ξ; ν) ' 2.41 10 −1 −3 (and Aν (ξ; ν) ' 2.40 10 ). If we now shift the central 6 Combining space-filling and estimation point away from the center towards the upper right (or designs lower left) corner, say to the coordinate (3/4, 3/4) as in the right panel of Fig. 9, the criterion is improved ˜ A rather straightforward method for combining esti- to a value of φG(ξ) ' 1.511, attained for x at the op- −3 mation and prediction-based designs is suggested in posite corner, and Mν (ξ; ν) is increased to 4.79 10 −1 −3 (M¨uller,2007). First a design consisting of a certain (and Aν (ξ; ν) to 4.71 10 ). This effect is enhanced number of points n0 is selected to maximize a criterion as we move the central point closer to one of the non- for the estimation of β, e.g. det[Mβ(ξ, θ)]; it is then central points and as we increase the value of ν and optimally augmented by n1 points for a criterion re- clearly shows the need to go beyond space-filling in this lated to the estimation of ν, e.g. det[Mν (ξ, ν)], to yield scenario. a complete design with n = n0 + n1 points. A similar idea can be exploited here and we suggest the following strategy. 7 Algorithms

– 1) Choose a space-filling, model-free, criterion φ0(·), Sequential and adaptive design In the sequential con- e.g., φmM (·), φMm(·) or a criterion from Sect. 3. struction of a design, points can be introduced one-at- ∗ – 2) Determine a n-point optimal design ξn,0 for φ0(·), a-time (full sequential design) or by batches of given ˜ ∗ compute φG(ξn,0). Set n1 = 1. size m > 1. At each stage, all points introduced previ- ∗ – 3) Determine a n0-point optimal design ξn0 for φ0(·), ously are fixed, which means in particular that the as- ∗ with n0 = n − n1; augment ξn0 to a n-point design sociated observations can be used to better select new ∗ ξn0,n1 by choosing n1 points optimally for the crite- points. The design is called adaptive when this infor- ˜ ∗ rion det[Mν (ξn0,n1 ; ν)], compute φG(ξn0,n1 ) (30). mation is used. This is especially useful in model-based 16 design, when the criterion depends on some unknown 2006) for a recent survey on optimization methods, in- parameters, for instance ν in the kriging variance (19) cluding numerical constructions of Lh maximin designs or (29). Suppose that φ(·) is a criterion to be minimized. (up to d = 10 and n = 300). Recent methods suggested At stage k of a full-sequential design one constructs for purely geometric problems, see, e.g., Cort´esand ξk+1 = (ξk, xk+1), with ξk = (x1,..., xk) already de- Bullo (2009), could be transferred to more statistically- termined, by choosing based space-filling criteria. Software can be obtained for instance from (Royle and Nychka, 1998; Walvoort xk+1 = arg min φ[(ξk, x)] . (31) x∈X et al, 2010), see also the packages DiceDesign (http: //www.dice-consortium.fr/) by Franco, Dupuy and It should be noticed that such a construction is not Roustant or lhs by Carnell (2009). We simply indicate always suitable. In particular, for G-optimality (21) a below a prototype algorithm and mention its main in- xk+1 chosen in this way will usually be in the close gredients. vicinity of a point already present in ξ , due to the k One of the simplest, but not very efficient, algorithm fact that φ (ξ) only depends on a local characteristic G is as follows: generate a random sequence of designs ξ , (the value of ρ2(x) at its maximizer). The much simpler k select the best one among them in terms of φ(·). Note construction that one does not need to store all designs, only the best 2 one found so far, say ξ∗ after k designs have been gen- xk+1 = arg max ρξ (x) (32) k x∈X k ∗ erated, and its associated criterion value φ(ξk) have to is often used for that reason. Note that when the cor- be remembered. This procedure is often used for Lh de- signs, for instance to generate good designs in terms of relation tends to zero, this xk+1 tends to be as far as possible from the points already present, similarly minimax or maximin distance. Note that it may help to to the greedy algorithm for maximin-optimal design generate the random sequence {ξk} according to a par- ticular process, see Franco (2008); Franco et al (2008) for which xk+1 = arg maxx∈X minxi∈ξk kx − xik. The design obtained by (32) will thus tend to be of the who use the Strauss point process, accounting for re- maximin-distance rather than minimax-distance type, pulsion between points and thus favorising maximin- and thus different from a G-optimal design (in partic- distance designs. Also note that each ξk generated can ular, (32) tends to push points to the boundary of X). be used as starting point for a local search algorithm When φ(ξ) = φ(ξ; ν) depends on some unknown pa- (using generalized gradients when φ(·) is not differen- rameters ν, the construction (31) is easily made adap- tiable). A more sophisticated version of this idea is as tive by using a forced-certainty-equivalence adaptation follows. rule (see, e.g., Pronzato (2008)) that replaces at stage Let ξk denote the design at iteration k and φ(·) be k k the unknown ν byν ˆ estimated from ξk and the as- a criterion to be minimized. Consider an algorithm for sociated observations y(ξk). One then chooses xk+1 = which at iteration k one exchanges one point xi of ξk k ∗ arg minx∈X φ[(ξk, x;ν ˆ )]; the adaptive version of (32) is with a new one x , see Fedorov (1972); Mitchell (1974) simply x = arg max ρ2 (x, νˆk). for exchange algorithms originally proposed for optimal k+1 x∈X ξk design in a parametric setting. Three elements must Non-sequential design The direct minimization of a func- then be defined: (i) how to select xi? (ii) how to con- ∗ ∗ tion φ(·) with respect to ξ = (x1,..., xn) is a rather struct x ? (iii) what to do if the substitution of x for formidable task even for moderate values of n and d xi in ξk, possibly followed by a local search, does not when φ(·) is not convex and local minimizers exist, improve the value of φ(·)? which is always the case for the criteria considered here. Typically, the answer to (i) involves some random- Instead of performing a direct optimization with re- ness, possibly combined with some heuristics (for in- spect to ξ ∈ Rnd (or over a finite class in the case stance, for maximin-optimal design, it seems reason- of Lh designs, see Sect. 2.2), most approaches com- able to select xi among the pairs of points at respec- ∗ bine heuristics with an exchange algorithm. The meth- tive distance φMm(ξ)). For (ii), the choice of x can be ods are abundant, ranging from genetic algorithms and purely random (e.g., a random walk in X originated at tabu search (see e.g., Glover et al (1995)) to simulated xi), or based on a deterministic construction, or a mix- annealing (Morris and Mitchell, 1995). Some are more ture of both. Finally, for (iii), a simulated-annealing + adapted to combinatorial search (and thus useful when method is appropriate: denote ξk the design obtained ∗ + working in the class of Lh designs, see van Dam et al by substituting x for xi in ξk, set ξk+1 = ξk when + (2007) for a branch-and-bound algorithm for maximin φ(ξk ) < φ(ξk) (improvement) but also accept this move + Lh designs and Jin et al (2005) for a stochastic evo- with probability Pk = exp{−[φ(ξk ) − φ(ξk)/Tk]} when + lutionary method). One may refer to (Husslage et al, φ(ξk ) > φ(ξk). Here Tk denotes a ‘temperature’ that 17 should decrease with k. Note that keeping trace of the regression (D-optimal for instance) puts more points ∗ best design encountered, ξk at step k, facilitates the near the boundary of the domain as the degree of the proof of convergence to an optimal design: indeed, lim polynomial increases; in the limit, the design points are supk→∞ φ(ξk) may not converge to its minimal value distributed with the arc-sine distribution. These obser- ∗ φ = minξ φ(ξ) (and there might not be a reversible vations speak for space-filling designs that do not fill measure for the transition kernel from ξk to ξk+1), but the space uniformly, but rather put more points near its ∗ ∗ it is easy to prove that limk→∞ φ(ξk) = φ when there boundaries. Since such designs will place some points is enough randomness in (i) and (ii). One may refer to close together, they may help the estimation of covari- (Morris and Mitchell, 1995) for indications on how to ance parameters in kriging, and thus perhaps kill two choose the initial temperature T0 and make it decrease birds with one stone. with k. See e.g. Schilling (1992) and Angelis et al (2001) for the use of a similar algorithm in a related framework. Acknowledgements We wish to thank Joao Rendas, Milan Stehl´ıkand Helmut Waldl for their valuable com- ments and suggestions. We also thank the editors and 8 Concluding remarks referee for their careful reading and encouraging com- ments that helped us to improve the readability of the The design of computer experiments has been a rapidly paper. growing field in the last few years, with special empha- sis put on the construction of criteria quantifying how spread out a design is: geometric measures, related to Appendix A: Computation of the minimax dis- sphere covering and sphere packing (minimax and max- tance criterion via Delaunay tesselation imin distance designs), statistical measures of unifor- mity (discrepancy and more recently entropy). Much Let x∗ be a point of X satisfying mini kx − xik = ∗ work remains to be done to determine which approaches φ(ξmM ). If x∗ is in the interior of X, it is at equal dis- are more suitable for computer experiments and to con- tance of d + 1 points of X that form a non-degenerate struct efficient algorithms tailored to specific criteria simplex; it is therefore the center of a circumscribed (some being easier to optimize than the others). sphere to a simplex in the Delaunay tessellation of the The paper has also drawn attention on the impor- points of ξ (which we call a Delaunay sphere). tance of going beyond space filling. The estimation of Suppose now that x∗ lies on the boundary of X. It parameters in a model-based approach calls for designs then belongs to some (d − q)-dimensional face Hq of X, that are not uniformly spread out. A simple procedure 1 ≤ q ≤ d (a 0-dimensional face being a vertex of X, a has been proposed (Sect. 6), but here also much remains 1-dimensional face an edge, etc., a (d − 1)-dimensional to be done. We conclude the presentation by mentioning face is a (d − 1)-dimensional hyperplane). Also, it must some recent results that might help reconciliating the be at equal distance D∗ from m = d + 1 − q points of ξ space-filling and non-space-filling points of view. When and no other point from ξ can be closer. Consider now a model selection strategy using a spatial information the symmetric points of those m points with respect criterion is employed, Hoeting et al (2006) note that to the q different (d − 1)-dimensional faces of X that clustered designs perform the best. When different lev- define Hq; we obtain in this way m(q + 1) points that ∗ els of accuracy are required, nested space-filling designs are all at distance D from x∗. No other point from have shown to be useful (cf. Qian et al (2009); Ren- ξ, or any symmetric of a point of ξ with respect to a ∗ nen et al (2010)) and non-space fillingness arises nat- (d − 1)-face of X, is at distance from x∗ less than D . urally in some sequentially designed experiments (see Since m(q + 1) = (d + 1 − q)(q + 1) ≥ d + 1 (with d Gramacy and Lee (2009)). Picheny et al (2010) induce equality when q = d, that is when x∗ if one of the 2 it by focussing their attention towards particular target vertices of X), x∗ is always at the center of a Delaunay regions. sphere obtained from the tessellation of the points in More generally, Dette and Pepelyshev (2010) ob- ξ and their 2d + 1 symmetric points with respect to serve that the kriging variance for a uniform design on the (d − 1)-dimensional faces of X. (The tessellation [0, 1] is (in general) larger near the boundaries than obtained is not unique in general due to the enforced near the center, so that a G-optimal design tends to symmetry among the set of points constructed, but this put more points near the boundaries (note, however, is not an issue.) that this is not true for the exponential covariance func- The center z∗ and radius R of the circumscribed tion, see Fig. 6 and 7, due to the particular Markovian sphere to a simplex defined by d+1 vectors z1,..., zd+1 d > structure of the Ornstein-Uhlenbeck process on the real of R is easily computed as follows. Since (z∗−zi) (z∗− 2 > > > 2 line). Similarly, an optimal experiment for polynomial zi) = R for all i, 2 zi z∗ − zi zi = z∗ z∗ − R is a 18 constant. Denote this constant by γ and the vector ∂φ[q](ξ)/∂q gives formed by the squared norm of the zi by w, so that {w} = z>z for all i. We thus have, in matrix form, ∂φ[q](ξ) φ[q](ξ) £ i ¤i i = P > > > 2 M −q 2Z −1 (z∗ γ) = w with Z the d × (d + 1) matrix ∂q q µi φ (ξ) (" i=1 #i " # (z1,..., zd+1) and 1 the (d + 1)-dimensional vector of XM XM > −q −q ones. Note that the singularity of the matrix [2Z − 1] × µi φi (ξ) log µi φi (ξ) would imply that all z lie in a (d − 1)-dimensional hy- i=1 i=1 i ) perplane; the matrix is thus of full rank when the zi XM −q −q form a non-degenerate simplex. The values of z∗ and γ − µi φi (ξ) log[φi (ξ)] ≤ 0 for any q , are directly obtainedp from the equation above and R is i=1 > then given by z∗ z∗ − γ. where the inequality follows from Jensen’s inequality (the function x → x log x being convex). The inequality is strict when the φi(ξ) take at least two different values and φ[q](ξ) then decreases monotonically to φ(ξ) as q → ∞. Similarly to the case of φ (·), we have φ (ξ) = Appendix B: regularization through Lq norms [q] [−1] PM i=1 µi φi(ξ) ≥ φ(ξ). Moreover, Consider a design criterion φ(·) which can be written ( ) as the minimum of a set of criteria, φ(ξ) = mini φi(ξ) . XM lim φ[q](ξ) = exp µi log[φi(ξ)] , Suppose that this set of criteria φi(·) is finite (exten- q→0 sions to infinite sets and generalized classes of criteria i=1 indexed by a continuous parameters are possible but which, by continuity can be defined as being φ (ξ). useless here for our purpose), so that i ∈ {1,...,M} [0] Define µ = min{µ , i = 1,...,M}. We have, for for some finite M. The min function makes φ(·) non i any ξ such that φ(ξ) > 0 and any q > 0, smooth even when the φi(·) are. Different regulariza- tion methods can be used in that case to construct a φ (ξ) ≤ φ(ξ) ≤ φ (ξ) ≤ µ−1/qφ (ξ) , (35) smooth approximation of φ(·). [q] [q] [q] Suppose that φ(ξ) > 0 and define so that

0 ≤ φ(ξ) − φ (ξ) ≤ (µ−1/q − 1)φ(ξ∗) , " #−1/q [q] XM −q −1/q ∗ φ (ξ) = φ (ξ) . (33) 0 ≤ φ[q](ξ) − φ(ξ) ≤ (µ − 1)φ(ξ ) , [q] i i=1 where ξ∗ is optimal for φ(·) and µ−1/q tends to 1 as q → ∞. The convergence of φ (·) and φ (·) to φ(·), [q] [q] From a property of Lq norms, φ (ξ) ≤ φ (ξ) for [q2] [q1] respectively from below and from above, is thus uni- any q > q > 0, so that φ (ξ) with q > 0 forms a 1 2 [q] form over any set of designs such that φ(ξ) is bounded lower bound on φ(ξ) which tends to φ(ξ) as q → ∞. away from zero. Moreover, we can directly deduce from φ (ξ) is also an increasing function of q for q < 0 (35) that the φ-efficiency of optimal designs optimal for [q] φ (·) or φ (·) is at least µ1/q. Indeed, let ξ∗, ξ∗ and but is not defined at q = 0 (with limq→0− φ (ξ) = [q] [q] [q] [q] ∗ +∞ and limq→0+ φ (ξ) = 0). Note that φ (ξ) = ξ[q] respectively denote an optimal design for φ, φ (·) P [q] [−1] [q] M and φ (·); (35) implies that i=1 φi(ξ) ≥ φ(ξ). [q] Consider now the criterion φ(ξ∗ ) ≥ φ (ξ∗ ) ≥ φ (ξ∗) ≥ µ1/qφ(ξ∗) [q] [q] [q] [q] ∗ 1/q ∗ 1/q ∗ 1/q ∗ " #−1/q φ(ξ[q]) ≥ µ φ[q](ξ[q]) ≥ µ φ[q](ξ ) ≥ µ φ(ξ ) . XM −q φ[q](ξ) = µi φi (ξ) , (34) The best efficiency bounds are obtained when µ is max- i=1 imal, that is, when µ is the uniform measure and µ = 1/M. In that case, φ (ξ) = M 1/qφ (ξ) and φ (ξ) = [q] [q] [0] PM QM 1/M where µi > 0 for all i and i=1 µi = 1 (the µi define [ i=1 φi(ξ)] . a probability measure on the index set {1,...,M}). An obvious generalization of the regularization by

Again, for any ξ such that φ(ξ) > 0, φ[q](ξ) → φ(ξ) Lq norm is as follows. Let ψ(·) be a strictly increasing as q tends to ∞. The computation of the derivative function and ψ←(·) denote its inverse. Then, φ(ξ) = 19

← ψ {mini ψ[φi(ξ)]} , and, applying the Lq regulariza- 6 c3 tions above to the min function, we can define 1   " #−1/q  XM  φ (ξ) = ψ← µ {ψ[φ (ξ)]}−q , (36) [q,ψ]  i i  i=1   " #−1/q  XM  φ (ξ) = ψ← {ψ[φ (ξ)]}−q . (37) [q,ψ]  i  i=1 1 - A case of special interest is ψ(t) = exp(t), which gives c2 ( ) M ¯ 1 X CMm φ (ξ) = − log exp[−qφi(ξ)] , [q,exp] q 1 i=1 ©c1 C¯Mm 3 and is appealing in situations where one may have φi(ξ) Fig. 10 The set [0, 1] \ P(φMm). ≤ 0, see Li and Fang (1997).

φmM > φMm/2 implies that C¯Mm = C(φMm/2) ≥ ¯ Appendix C: derivation of bounds on the kriging CmM (with also CMm ≥ CMm). Next, sincec ¯(x) = variance maxi{c(x)}i ≥ CmM , we have

n n n n 2 2 C ⊂ [0, 1] \ [0, c¯(x)] ⊂ [0, 1] \ [0,CmM ] . (40) β is known We have ρ0(x) ≤ 1−maxi{c(x)}i /λmax(C), with λ (C) the maximum eigenvalue of C. Since C(·) max > −1 is non-increasing, {C} ≤ C for all i 6= j. We de- Consider T (x) = c (x)C 1. Notice that {c(x)}i = ij Mm −1 1 for some i implies that x = xi, and that C c(xi) = note by C the set of matrices C satisfying 0 ≤ {C}ij = ei, the i-th basis vector, so that T (xi) = 1. Also, if {C}ji ≤ CMm for i 6= j. A classical inequality on ma- 1/2 {c(x)}i = 1 for some i, then kx − xjk ≥ φMm and trix norms gives λmaxP(C) = kCk2 ≤ (kCk1 kCk∞) , where kCk = max |{C} | = kCk . Therefore, thus {c(x)}j ≤ CMm for all j 6= i. When the cor- 1 j i ij ∞ −1 relation is weak enough, 0 < {C 1}i ≤ 1 for all any C ∈ C satisfies λmax(C) ≤ 1 + (n − 1)CMm and i (which is true for some processes whatever the im- 2 2 2 c¯(x) portance of the correlation, it is the case for instance ρ0(x) ≤ ρ¯0(x) = 1 − , (38) 1 + (n − 1)CMm for the one-dimensional Ornstein-Uhlenbeck process). > This gives T (x) ≤ c (x)1 ≤ 1 + (n − 1)C¯Mm . Also, wherec ¯(x) = max {c(x)} . Since min kx − x k ≤ φ i i i i mM (40) implies that the minimum of T (x) is larger than for all x ∈ X, we havec ¯(x) ≥ CmM for all x and > −1 c¯(x) e1 C∗ 1 with 2 µ ¶ 2 2 CmM > max ρ0(x) ≤ ρ¯0 = 1 − . (39) 1 CMm1n−1 x∈X 1 + (n − 1)CMm C∗ = (41) CMm1n−1 In−1 Note that the bound (39) will become worse as n in- where I and 1 respectively denote the (n − 1)- creases since the bound 1 + (n − 1)CMm on λmax(C) n−1 n−1 becomes more and more pessimistic. Also, (38) can be dimensional identity matrix and vector of ones, which tight only for those x such that c(x) corresponds to the gives 1 − (n − 1)C direction of an eigenvector associated with λmax(C). Mm T (x) ≥ c¯(x) 2 . 1 − (n − 1)CMm β is unknown We need to bound the second term in > −1 2 Since 1 C 1 ≥ n/λmax(C) ≥ n/[1+(n−1)CMm], we ρ (x) given by (20). Our first step is to enclose the finally obtain feasible set for c(x) into a set C of simple description.

Notice that kx − xik < φMm/2 for some i implies that 2 2 2 1 + (n − 1)CMm 2 ρ (x) ≤ ρ¯ (x) =ρ ¯0(x) + R (x) (42) kx − xjk > φMm/2 for all i 6= j. Therefore, n

n 2 2 2 2 C ⊂ [0, 1] \ P(φMm) withρ ¯0(x) given by (38) and R (x) = max[Ra(x),Rb ] where n with P(φMm) = {c ∈ [0, 1] : ∃i 6= j with {c}i > · ¸2 C¯Mm and {c}j > C¯Mm}, where C¯Mm = C(φMm/2), 1 − (n − 1)C R2(x) = 1 − c¯(x) Mm see Fig. 10 for an illustration when n = 3. Notice that a 2 1 − (n − 1)CMm 20

2 2 ¯2 and Rb = (n − 1) CMm. It should be noticed that the Carnell R (2009) lhs: Latin Hypercube Samples. R 2 upper bound Rb is very pessimistic. In fact, maxx T (x) package version 0.5 seldom exceeds one (it may do so marginally when C(t) Chen VCP, Tsui KL, Barton RR, Meckesheimer M is concave at t = 0), see for instance Joseph (2006), and (2006) A review on design, modeling and appli- 2 2 for that reason it is sufficient to use R (x) = Ra(x). cations of computer experiments. IIE Transactions 38(4):273–291 Cignoni P, Montani C, Scopigno R (1998) DeWall: A References fast divide and conquer Delaunay triangulation algo- rithm in Ed. Computer-Aided Design 30(5):333–341 Abt M (1999) Estimating the prediction mean squared Cort´esJ, Bullo F (2009) Nonsmooth coordination and error in Gaussian stochastic processes with exponen- geometric optimization via distributed dynamical tial correlation structure. Scandinavian Journal of systems. SIAM Review 51(1):163–189 Statistics 26(4):563–578 Cressie N (1993) Statistics for Spatial Data. Wiley- Angelis L, Senta EB, Moyssiadis C (2001) Optimal Interscience, New York, wiley Series in Probability exact experimental designs with correlated errors and Statistics, rev sub Edition through a simulated annealing algorithm. Comput den Hertog D, Kleijnen JPC, Siem AYD (2006) The Stat Data Anal 37(3):275–296 correct kriging variance estimated by bootstrap- Ash R (1965) Information Theory. Wiley, New York, ping. Journal of the Operational Research Society (Republished by Dover, New York, 1990) 57(4):400–409 Audze P, Eglais V (1977) New approach for plan- Dette H, Pepelyshev A (2010) Generalized latin hyper- ning out experiments. Problems of Dynamics and cube design for computer experiments. Technomet- Strengths 35:104–107 rics 52(4):421–429 Ball K (1992) Eigenvalues of Euclidean distance matri- Dette H, Kunert J, Pepelyshev A (2008) Exact opti- ces. Journal of Approximation Theory 68:74–82 mal designs for weighted least squares analysis with Bates RA, Buck RJ, Riccomagno E, Wynn HP (1996) correlated errors. Statistica Sinica 18(1):135–154 Experimental design and observation for large sys- Fang KT (1980) The uniform design: application of tems. Journal of the Royal Statistical Society Series number theoretic methods in experimental design. B (Methodological) 58(1):77–94 Acta Mathematicae Applicatae Sinica 3:363–372 Beirlant J, Dudewicz E, Gy¨orfi L, van der Meulen Fang KT, Li R (2006) Uniform design for computer E (1997) Nonparametric entropy estimation; an experiments and its optimal properties. Interna- overview. International Journal of Mathematical and tional Journal of Materials and Product Technology Statistical Sciences 6(1):17–39 25(1):198–210 Bellhouse DR, Herzberg AM (1984) Equally spaced de- Fang KT, Wang Y (1993) Number-Theoretic Methods sign points in polynomial regression: A comparison of in Statistics (Chapman & Hall/CRC Monographs on systematic sampling methods with the optimal de- Statistics & Applied Probability), 1st edn. Chapman sign of experiments. Canadian Journal of Statistics and Hall/CRC 12(2):77–90 Fang KT, Lin DKJ, Winker P, Zhang Y (2000) Uni- Bettinger R, DuchˆeneP, Pronzato L, Thierry E (2008) form design: Theory and application. Technometrics Design of experiments for response diversity. In: 42(3):237–248 Proc. 6th International Conference on Inverse Prob- Fang KT, Li R, Sudjianto A (2005) Design and lems in Engineering (ICIPE), Journal of Physics: Modeling for Computer Experiments. Chapman and Conference Series, Dourdan (Paris) Hall/CRC Bettinger R, DuchˆeneP, Pronzato L (2009) A sequen- Fedorov V (1972) Theory of Optimal Experiments. tial design method for the inversion of an unknown Academic Press, New York system. In: Proc. 15th IFAC Symposium on System Fedorov VV, Hackl P (1997) Model-Oriented Design of Identification, Saint-Malo, France, pp 1298–1303 Experiments (Lecture Notes in Statistics), vol 125. Bischoff W, Miller F (2006) Optimal designs which Springer are efficient for lack of fit tests. Annals of Statistics Franco J (2008) Planification d’exp´eriencesnum´eriques 34(4):2015–2025 en phase exploratoire pour la simulation de ph´eno- Boissonnat JD, Yvinec M (1998) Algorithmic Geome- m`enes complexes. Ph.D. Thesis, Ecole´ Nationale try. Cambridge University Press Sup´erieuredes Mines de Saint Etienne Bursztyn D, Steinberg D (2006) Comparison of de- Franco J, Bay X, Corre B, Dupuy D (2008) signs for computer experiments. Journal of Statistical Planification d’exp´eriences num´eriques `a par- Planning and Inference 136(3):1103–1119 21

tir du processus ponctuel de Strauss. Pre- experiments. Journal of Statistical Planning and In- print, D´epartement 3MI, Ecole´ Nationale ference 134(1):268–287 Sup´erieure des Mines de Saint-Etienne, http: Johnson M, Moore L, Ylvisaker D (1990) Minimax //hal.archives-ouvertes.fr/hal-00260701/fr/ and maximin distance designs. Journal of Statistical Franco J, Vasseur O, Corre B, Sergent M (2009) Min- Planning and Inference 26:131–148 imum Spanning Tree: a new approach to assess Johnson RT, Montgomery DC, Jones B, Fowler JW the quality of the design of computer experiments. (2008) Comparing designs for Chemometrics and Intelligent Laboratory Systems experiments. In: WSC ’08: Proceedings of the 40th 97:164–169 Conference on Winter Simulation, pp 463–470 Gensane T (2004) Dense packings of equal spheres in a Joseph V (2006) Limit kriging. Technometrics cube. Electronic J Combinatorics 11 48(4):458–466 Glover F, Kelly J, Laguna M (1995) Genetic algorithms Jourdan A, Franco J (2010) Optimal Latin hypercube and tabu search: hybrids for optimization. Comput- designs for the Kullback-Leibler criterion. Advances ers and Operations Research 22(1):111–134 in Statistical Analysis 94:341–351 Gramacy R, Lee H (2010) Cases for the nugget in Kiefer J, Wolfowitz J (1960) The equivalence of two ex- modeling computer experiments. Tech. rep., URL tremum problems. Canadian Journal of Mathematics http://arxiv.org/abs/1007.4580 12:363–366 Gramacy RB, Lee HK (2009) Adaptive design and anal- Kiseˇl´akJ, Stehl´ıkM (2008) Equidistant and d-optimal ysis of supercomputer experiments. Technometrics designs for parameters of Ornstein–Uhlenbeck pro- 51(2):130–144 cess. Statistics & Probability Letters 78(12):1388– Griffith D (2003) Spatial Autocorrelation and Spatial 1396 Filtering: Gaining Understanding through Theory Kleijnen JPC (2009) Design and Analysis of Simulation and Scientific Visualization. Springer-Verlag, Berlin Experiments. Springer US Hall P, Morton S (1993) On the estimation of entropy. Koehler J, Owen A (1996) Computer experiments. In: Ann Inst Statist Math 45(1):69–88 Ghosh S, Rao CR (eds) Handbook of Statistics, 13: Harville DA, Jeske DR (1992) Mean squared error Design and Analysis of Experiments, North-Holland, of estimation or prediction under a general linear pp 261–308 model. Journal of the American Statistical Associ- Kozachenko L, Leonenko N (1987) On statistical esti- ation 87(419):724–731 mation of entropy of random vector. Problems Infor Havrda M, Charv´atF (1967) Quantification method Transmiss 23(2):95–101, (translated from Problemy of classification processes: concept of structural α- Peredachi Informatsii, in Russian, vol. 23, No. 2, pp. entropy. Kybernetika 3:30–35 9-16, 1987) Herzberg AM, Huda S (1981) A comparison of equally Leary S, Bhaskar A, Keane A (2003) Optimal spaced designs with different correlation structures orthogonal-array-based latin hypercubes. Journal of in one and more dimensions. Canadian Journal of Applied Statistics 30(5):585–598 Statistics 9(2):203–208 Leonenko N, Pronzato L, Savani V (2008) A class of Hoeting JA, Davis RA, Merton AA, Thompson SE R´enyi information estimators for multidimensional (2006) Model selection for geostatistical models. Eco- densities. Annals of Statistics 36(5):2153–2182 (Cor- logical Applications 16(1):87–98 rection in AS, 38(6):3837–3838, 2010) Husslage B, Rennen G, van Dam E, den Hertog Li XS, Fang SC (1997) On the entropic regularization D (2006) Space-filling latin hypercube designs for method for solving min-max problems with applica- computer experiments. Discussion Paper 2006-18, tions. Mathematical Methods of Operations Research Tilburg University, Center for Economic Research 46:119–130 Iooss B, Boussouf L, Feuillard V, Marrel A (2010) Nu- McKay M, Beckman R, Conover W (1979) A compar- merical studies of the metamodel fitting and valida- ison of three methods for selecting values of input tion processes. International Journal on Advances in variables in the analysis of output from a computer Systems and Measurements 3(1 & 2):11–21 code. Technometrics 21(2):239–245 Irvine K, Gitelman A, Hoeting J (2007) Spatial designs Melissen H (1997) Packing and covering with circles. and properties of spatial correlation: Effects on co- Ph.D. Thesis, University of Utrecht variance estimation. Journal of Agricultural, Biolog- Mitchell T (1974) An algorithm for the construction ical, and Environmental Statistics 12(4):450–469 of “D-optimal” experimental designs. Technometrics Jin R, Chen W, Sudjianto A (2005) An efficient algo- 16:203–210 rithm for constructing optimal design of computer 22

Morris M, Mitchell T (1995) Exploratory designs for Redmond C, Yukich J (1996) Asymptotics for Euclid- computational experiments. Journal of Statistical ian functionals with power-weighted edges. Stochas- Planning and Inference 43:381–402 tic Processes and their Applications 61:289–304 M¨ullerWG (2007) Collecting Spatial Data: Optimum Rennen G, Husslage B, van Dam E, den Hertog D Design of Experiments for Random Fields, 3rd edn. (2010) Nested maximin Latin hypercube designs. Springer, Heidelberg Struct Multidisc Optiml 41:371–395 M¨ullerWG, Stehl´ıkM (2010) Compound optimal spa- R´enyi A (1961) On measures of entropy and informa- tial designs. Environmetrics 21(3-4):354–364 tion. In: Proc. 4th Berkeley Symp. on Math. Statist. M¨ullerWG, Pronzato L, Waldl H (2010) Relations be- and Prob., pp 547–561 tween designs for prediction and estimation in ran- Riccomagno E, Schwabe R, Wynn HP (1997) Lattice- dom fields: an illustrative case. Submitted based D-optimum design for fourier regression. The Nagy B, Loeppky JL, Welch WJ (2007) Fast bayesian Annals of Statistics 25(6):2313–2327 inference for gaussian process models. Tech. rep., Royle J, Nychka D (1998) An algorithm for the con- The University of British Columbia, Department of struction of spatial coverage designs with imple- Statistics mentation in SPLUS. Computers & Geosciences Narcowich F (1991) Norms of inverses and condition 24(5):479–488 numbers for matrices associated with scattered data. Sacks J, Welch W, Mitchell T, Wynn H (1989) Design Journal of Approximation Theory 64:69–94 and analysis of computer experiments. Statistical Sci- Niederreiter H (1992) Random Number Generation and ence 4(4):409–435 Quasi-Monte Carlo Methods (CBMS-NSF Regional Santner T, Williams B, Notz W (2003) The Design and Conference Series in Applied Mathematics). SIAM Analysis of Computer Experiments. Springer, Heidel- Okabe A, Books B, Sugihama K (1992) Spatial Tessel- berg lations. Concepts and Applications of Voronoi Dia- Schaback R (1994) Lower bounds for norms of inverses grams. Wiley, New York of interpolation matrices for radial basis functions. Oler N (1961) A finite packing problem. Canadian Journal of Approximation Theory 79:287–306 Mathematical Bulletin 4:153–155 Schilling MF (1992) Spatial designs when the observa- Pebesma EJ, Heuvelink GBM (1999) Latin hypercube tions are correlated. Communications in Statistics - sampling of gaussian random fields. Technometrics Simulation and Computation 21(1):243–267 41(4):303–312 Scott D (1992) Multivariate Density Estimation. Wiley, Penrose M, Yukich J (2011) Laws of large numbers and New York nearest neighbor distances. In: Wells M, Sengupta A Shewry M, Wynn H (1987) Maximum entropy sam- (eds) Advances in Directional and Linear Statistics. pling. Applied Statistics 14:165–170 A Festschrift for Sreenivasa Rao Jammalamadaka, Sj¨ostedt-De-LunaS, Young A (2003) The bootstrap arXiv:0911.0331v1, to appear and kriging prediction intervals. Scandinavian Jour- Petelet M, Iooss B, Asserin O, Loredo A (2010) Latin nal of Statistics 30(1):175–192 hypercube sampling with inequality constraints. Ad- Stein M (1999) Interpolation of Spatial Data. Some vances in Statistical Analysis 94:325–339 Theory for Kriging. Springer, Heidelberg Picheny V, Ginsbourger D, Roustant O, Haftka RT, Stinstra E, den Hertog D, Stehouwer P, Vestjens A Kim NH (2010) Adaptive designs of experiments for (2003) Constrained maximin designs for computer accurate approximation of a target region. Journal of experiments. Technometrics 45(4):340–346 Mechanical Design 132(7):071,008 Sun X (1992) Norm estimates for inverses of Euclidean Pistone G, Vicario G (2010) Comparing and generat- distance matrices. Journal of Approximation Theory ing Latin Hypercube designs in Kriging models. Ad- 70:339–347 vances in Statistical Analysis 94:353–366 Tang B (1993) Orthogonal array-based latin hyper- Pronzato L (2008) Optimal experimental design cubes. Journal of the American Statistical Associa- and some related control problems. Automatica tion 88(424) 44(2):303–325 Tsallis C (1988) Possible generalization of Boltzmann- Putter H, Young A (2001) On the effect of covariance Gibbs statistics. Journal of Statistical Physics function estimation on the accuracy of kriging pre- 52(1/2):479–487 dictors. Bernoulli 7(3):421–438 van Dam E (2007) Two-dimensional minimax Qian PZG, Ai M, Wu CFJ (2009) Construction of Latin hypercube designs. Discrete Applied Math nested space-filling designs. The Annals of Statistics 156(18):3483–3493 37(6A):3616–3643 23 van Dam E, Hussage B, den Hertog D, Melissen H (2007) Maximin Latin hypercube designs in two di- mensions. Operations Research 55(1):158–169 van Dam E, Rennen G, Husslage B (2009) Bounds for Maximin Latin hypercube designs. Operations Re- search 57(3):595–608 van Groenigen J (2000) The influence of variogram pa- rameters on optimal sampling schemes for mapping by kriging. Geoderma 97(3-4):223–236 Walvoort DJJ, Brus DJ, de Gruijter JJ (2010) An R package for spatial coverage sampling and random sampling from compact geographical strata by k- means. Computers & Geosciences 36:1261–1267 Wendland H (2005) Scattered Data Approximation. Cambridge University Press Wolkowicz H, Styan G (1980) Bounds for eigenvalues using traces. Linear Algebra and its Applications 29:471–506 Wynn H (2004) Maximum entropy sampling and gen- eral equivalence theory. In: Di Bucchianico A, L¨auter H, Wynn H (eds) mODa’7 – Advances in Model– Oriented Design and Analysis, Proceedings of the 7th Int. Workshop, Heeze (Netherlands), Physica Verlag, Heidelberg, pp 211–218 Yfantis E, Flatman G, Behar J (1987) Efficiency of krig- ing estimation for square, triangular, and hexagonal grids. Mathematical Geology 19(3):183–205–205 Yukich J (1998) Probability Theory of Classical Eu- clidean Optimization Problems. Springer, Berlin Zagoraiou M, Antognini AB (2009) Optimal designs for parameter estimation of the ornstein-uhlenbeck pro- cess. Applied Stochastic Models in Business and In- dustry 25(5):583–600 Zhang H, Zimmerman D (2005) Towards reconcil- ing two asymptotic frameworks in spatial statistics. Biometrika 92(4):921–936 Zhu Z, Stein M (2005) Spatial sampling design for pa- rameter estimation of the covariance function. Jour- nal of Statistical Planning and Inference 134(2):583– 603 Zhu Z, Zhang H (2006) Spatial sampling design un- der the infill asymptotic framework. Environmetrics 17(4):323–337 Zimmerman D, Cressie N (1992) Mean squared pre- diction error in the spatial linear model with esti- mated covariance parameters. Ann Inst Statist Math 44(1):27–43 Zimmerman DL (2006) Optimal network design for spa- tial prediction, covariance parameter estimation, and empirical prediction. Environmetrics 17(6):635–652