Distance-Based Optimal Sampling in a Hypercube Analogies

Advances in Engineering Software 137 (2019) 102709 Contents lists available at ScienceDirect Advances in Engineering Software journal homepage: www.elsevier.com/locate/advengsoft Research paper Distance-based optimal sampling in a hypercube: Analogies to N-body T systems ⁎ Miroslav Vořechovský , Jan Mašek, Jan Eliáš Institute of Structural Mechanics, Brno University of Technology, Veveří 331/95, Brno 602 00, Czech Republic ARTICLE INFO ABSTRACT Keywords: A method is proposed for the construction of uniformly distributed point sets within a design domain using an ϕ criterion analogy to a dynamical system of interacting particles. The possibility of viewing various distance-based op- Audze-Eglajs timality criteria as formulas representing the potential energy of a system of charged particles is discussed. The Maximin criterion potential energy is employed in deriving the equations of motion of the particles. The particles are either at- MiniMax criterion tracted or repelled and dissipative dynamical systems can be simulated to achieve optimal and near-optimal Periodic space arrangements of points. The design domain is set up as an N -dimensional unit hypercube, with N being the Space-filling design var var Uniform design number of variables (factors). The number of points is equal to the number of simulations (levels). The peri- Robust design odicity assumption of the design domain is shown to be an elegant way to obtain statistically uniform coverage of Low-discrepancy the design domain. Design of experiments The ϕp criterion, which is a generalization of the Maximin criterion, is selected in order to demonstrate its Latin hypercube sampling analogy with an N-body system. This criterion guarantees that the points are spread uniformly within the design Monte carlo integration domain. The solution to such an N-body system is presented. The obtained designs are shown to outperform the existing optimal designs in various types of applications: multidimensional numerical integration, statistical exploration of computer models, reliability analyses of engineering systems, and screenings or exploratory designs for the global optimization/minimization of functions. 1. Introduction may be capable of replacing complex computational models, which are expensive to evaluate [37], see e.g. least squares-based polynomial The problem of selecting “uniformly distributed points” in a rec- chaos (PC) expansion [5,33] or least-squares support vector regression tangular domain is an old one that finds applications in many fields of [73]. science, mathematics and physics, engineering, biology, etc. [65]. The There are many types of design (i.e. a set of points selected from growing demand for the replacement of physical experiments by si- a design domain) that can be thought of as uniform in some sense. For mulations based on sophisticated mathematical models consecutively example, space-filling designs [30], which can be loosely interpreted as generates a strong demand for the development of point sets with designs that are representative of a design region, are widely used in somewhat uniformly distributed points within the design domain. One computer experiments. Designs with good space-filling properties are of the possible applications of these sets is the estimation of multi- mainly constructed when designing one-stage computer experiments. dimensional integrals using the Monte Carlo method. In such an ana- When seeking a design suitable for building a response surface using lysis of transformations of random variables/vectors featured in sta- Kriging [8], distance-based criteria have been found to be optimal [29] tistical, sensitivity and reliability analyses of engineering systems, a set (e.g. the Maximin criterion [29], its relaxation to the ϕp criterion [44] of points which are uniformly distributed with respect to probabilities is and its special case, the Audze-Eglajs criterion [2], and the miniMax needed. Another application of uniformly distributed points is in the criterion [29] and its relaxed variant [61]). field of the optimization [62] (maximization or minimization of func- The objective of observing the design domain “everywhere” is tions) or screenings [66] of computer models of physical experiments. achieved by designs that minimize the “distance” between the dis- Also, uniform designs obtained using methods from the field of Design tribution of observation sites and the uniform distribution. The notion of Experiments (DoE) for computer models are now being routinely of discrepancy captures this intuitive objective well (see [48, Chapter 2]) used to build response surfaces or meta-models (surrogate models) that and discrepancy in a way expresses the sample non-uniformity. ⁎ Corresponding author. E-mail address: [email protected] (M. Vořechovský). https://doi.org/10.1016/j.advengsoft.2019.102709 Received 4 February 2019; Received in revised form 3 July 2019; Accepted 19 August 2019 0965-9978/ © 2019 Elsevier Ltd. All rights reserved. M. Vořechovský, et al. Advances in Engineering Software 137 (2019) 102709 Moreover, the Koksma-Hlawka inequality [32] says that decreasing the D ={,,,}u1 u 2 … uNsim that are uniformly distributed. The notion of “discrepancy measure” of a design decreases the bound on the error of uniformity, although intuitively acceptable, does not have a single the integral estimated with a discrete sample set. Conceptually, the mathematical definition. In this paper, designs are meant to be uniform discrepancy of a sample design is an appropriate norm of the difference in two different ways, i.e. they should exhibit: between the number of points per sub-volume and a uniform smearing of points. Various versions can be considered to evaluate discrepancy; • statistical uniformity, meaning that any location (point) from the they have received an analytical expression and can therefore be design domain must have the same probability of being selected. minimized, as with e.g. Modified L2 discrepancy [14], Wrap-Around L2 This is absolutely necessary for the Monte Carlo integration of discrepancy [25] or Centered L2 discrepancy [16,25], and other ver- a function featuring random variables sampled from D by inverse sions [74] (see also Santner et al. [65] for a review). As noted in [34], transformation of the cumulative distribution function, these general measures of discrepancy are unsatisfactory as soon as the • sample uniformity, meaning that each single point layout is spread number of dimensions exceeds a few units because they are too hard to uniformly throughout the design domain, i.e. the points must have compute. a kind of equal spacing among them. Designs that enjoy sample The search for multivariate quadrature rules of minimal size with uniformity are often called uniform designs, i.e. low discrepancy de- a specified polynomial accuracy has been the topic of many yearsof signs [48]. research. When dealing with multidimensional integration, Quasi Monte Carlo sequences (QMC), which are sometimes named low-dis- The simultaneous fulfillment of statistical uniformity and sample crepancy sequences or digital nets, have become popular alternatives and uniformity with maximum projection regularity (noncollapsibility) is as- can be considered standard methods within this field. The use of low- sumed to be a sufficient condition for an optimal design. discrepancy sequences provides a cheap alternative to algorithmic dis- We show that sample uniformity can be effectively achieved by an crepancy minimization [48, Chapters. 3 & 4]. They have become, to- appropriate distance-based criterion, such as Maximin, miniMax, ϕp or gether with their randomized versions, the standard and benchmark for Audze-Egl js. However, statistical uniformity is known to be violated numerical integration. when using many standard criteria for sample optimization (either Another frequent requirement placed on designs is orthogonality. correlation or distance-based) [12,83]. The problem arises from the The orthogonality of a design ensures that all specified parameters may presence of the boundaries of the design domain. An elegant remedy be estimated independently of each other (estimation of main effects) that removes the problem by redefining the metric featured in the and interactions can also be estimated in a straightforward way. There distance-based criteria has recently been suggested and shown by the are various design types focused on orthogonality, such as various authors to be an effective solution [12,83]. The present paper will factorial designs in DoE, “orthogonal arrays” known from combina- employ this metric, which effectively makes the design domain periodic. torial designs, “mutually orthogonal latin squares”, etc. Some authors It was reported earlier in [21] that using a single criterion for se- simply tend to decrease the statistical correlations among the vectors of lecting experimental designs for surrogate construction may lead to samples of individual variables in the design, [44,79,80,82]. Main- small gains in that criterion at the expense of large deteriorations in taining pairwise orthogonality helps in achieving accuracy when per- other criteria. Constructing designs by combining multiple criteria re- forming integration via, e.g. the Monte Carlo type of method. duces the risk of using a poor design. The authors believe that using the In some analyses it is beneficial when the design is “non-collap- proposed combination of a proper distance-based criterion and the as- sible”, i.e. lower-dimensional projections

Load more