Modelling Step Discontinuous Functions Using Bayesian Emulation

Modelling Step Discontinuous Functions using Bayesian emulation. James Mark Anderson A thesis submitted to Auckland University of Technology in partial fulfilment of the requirements for the degree of Master of Science (MSc) 2017 School of Engineering, Computer and Mathematical Sciences Abstract Bayesian emulation is used in modelling complex simulators and is seen as an effi- cient and powerful statistical tool. The simulations can be very time-consuming to run, resulting in only being able to run the simulator a limited number of times. From the literature, it is has been suggested that the emulator is best suited for continuous functions; however, it is very common to find physical problems containing discontinuities. These discontinuities' positions may also be unknown and therefore, for this research, information on where the discontinuities will be withheld from the emulator. This thesis focuses on emulating the Heaviside function as one simple function containing step-discontinuities and then progresses to slightly more complex functions with step-discontinuities. Specific goodness-of-fit measure have been designed to highlight and measure the emulator when applied to theses step- discontinuous, such as mean squared error and a simple design of Jaccard index. The numerical calculations of the goodness-of-fit techniques are carried out in the R statistical programming language, with the BACCO package for the emulator. It is found that the emulator is able to model discontinuous functions to some degree but, unless there is certainty about the discontinuities' locations, care should be taken when using Bayesian emulation to model discontinuous functions. Contents List of Figures III 1 Introduction1 1.1 Scene Setting and Motivation . .1 1.1.1 Emulators . .1 1.1.2 Discontinuity . .2 1.1.3 Bayesian Emulators and Discontinuity . .3 1.2 Research Questions . .4 1.3 Chapter Summaries . .4 2 Literature and Mathematics Review6 2.1 Introduction . .6 2.2 Deterministic Simulators . .6 2.3 Step Discontinuity . .6 2.3.1 Example of Step Discontinuities . .7 2.4 Current Research on Modelling Discontinuities . 10 2.5 Bayesian Emulation . 11 2.5.1 Introduction . 11 2.5.2 The Bayesian Approach . 11 2.5.3 Bayes' theory . 12 2.5.4 Gaussian Process . 12 2.5.5 Bayesian Emulation and Notations . 13 2.5.6 BACCO in R . 16 2.5.7 Current Research on Emulators . 16 2.6 Current Research on Modelling Discontinuities with Bayesian Emu- lation . 17 3 An Introduction to the Emulator Modelling Step Discontinuous Functions 18 3.1 Example 1: One data point at the discontinuity . 20 3.2 Example 2: Two data point stitched at the discontinuity . 20 I 3.3 Example 3: Using a discontinuous function as the regressor, with one data point at the discontinuity . 20 3.4 Example 4: Using a discontinuous function as the regressor, with two data point stitched at the discontinuity . 21 3.5 Example 4: Using a discontinuous function as the regressor to model the Heaviside Function . 21 3.6 Conclusion . 22 4 Research Method for Quantitative Analysis 32 4.1 Introduction . 32 4.2 Research Approach . 32 4.2.1 Experiment Setup and Assumptions . 32 4.2.2 Order of Experiment with the Emulator . 35 4.2.3 Goodness of Fit . 39 5 Numerical Findings from the Emulator Experiments 43 5.1 Introduction . 43 5.2 One Dimension with One Discontinuity . 43 5.2.1 Mean Square Error (MSE) . 45 5.2.2 Approximating the Location of the Discontinuities from the Emulator . 50 5.2.3 Comparingη ^(0) to η (0) . 53 5.2.4 Investigating the steepness ofη ^0 (0) . 55 5.2.5 Summary . 56 5.3 One Dimension with Two Discontinuities . 57 5.3.1 Mean Square Error (MSE) . 57 5.3.2 Approximating and Comparing the Location of the Disconti- nuities from the Emulator . 58 5.3.3 Comparingη ^(−1) to η (−1) andη ^(1) to η (1) . 62 5.3.4 Summary . 63 5.4 Two Dimensions . 64 5.4.1 Mean Square Error (MSE) . 65 5.4.2 Examining the Discontinuous Region using a Jaccard Approach 66 5.4.3 Summary . 70 5.5 Two Dimensions + Time . 71 5.5.1 Mean Square Error (MSE) . 76 5.5.2 Examining the Discontinuous Region using a Jaccard index Approach . 77 5.5.3 Summary . 80 II 6 Discussion and Conclusion 81 6.1 Discussion and Analysis . 81 6.1.1 Introduction . 81 6.1.2 Research Contribution . 81 6.2 Conclusions . 84 6.3 Further Research . 87 6.3.1 Conclusion . 89 Appendix A Emulator Properties and Proofs 90 Appendix B Investigation into Optimising B 94 B.1 Introduction . 94 B.2 Findings . 94 B.3 Conclusion . 99 Appendix C Other Minor Observations 100 Appendix D Software Used 104 D.1 R...................................... 104 D.2 Matlab . 104 Appendix E R code 105 E.1 Figure B.1................................ 105 E.2 Figure 5.1................................. 106 E.3 Figure 5.3, 5.4, 5.5 and 5.6....................... 107 E.4 Figure 5.7................................. 109 E.5 Figure 5.9 and 5.10............................ 109 E.6 Figure 5.12, 5.13, 5.14 and 5.15.................... 111 E.7 Figure 5.17 and 5.18........................... 112 E.8 Figure 5.20 and 5.21........................... 113 Reference List 115 III List of Figures 1.1 Example of a Heaviside step function . .2 2.1 HGD discontinuity graph example . .8 2.2 Aerial photo of ammonia released in Houston . .8 2.4 Simple Setup for a Linear Regression Discontinuity Design . 10 3.2 Example of the emulator modelling the Heaviside function with one data point close to the discontinuity . 24 3.3 Example of the emulator modelling the Heaviside function with one data point close but on the other side to the discontinuity . 25 3.4 Example when using two data points on either side the discontinuity 26 3.5 Example of the emulator modelling the Heaviside function with one data point close to the discontinuity, using a discontinous regressor function . 27 3.6 Example when using two data points on either side the discontinuity, using a discontinous regressor function . 28 3.7 An example of using a discontinuous regressor to model the Heaviside function . 29 3.8 Example of using a discontinuous regressor to model the Heaviside function, assuming that the location of the discontinuity is unknown . 30 3.9 Example of using a discontinuous regressor that conflicts the data points to model the Heaviside function . 31 4.1 Example of the emulator using a linear prior . 33 4.2 Example of the emulator using a constant prior . 33 4.3 Examples of emulators if data points were fixed and Discontinuity Positions were sampled . 35 4.4 Example of emulator with contain prior and B =1 .......... 36 4.5 Example of emulator with containt prior and B = 20 and the data points scaled . 36 4.6 Example of when B from the correlation function is low . 37 4.7 Example of when B from the correlation function is high . 37 4.8 Example of Jaccard regions for one dimension with two discontinuities 42 IV 5.1 Bayesian emulator modelling Heaviside function with a constant as a prior . 44 5.2 Another example of the Bayesian emulator modelling Heaviside function with a constant as a prior . 44 5.3 Histogram of MSE, one dimension with one discontinuity . 45 5.4 Histogram of MSE, 8 data points . 46 5.5 PDF of the MSE, 4 data points . 46 5.6 CDF of the MSE, 4 data points . 47 5.7 Example of the emulator with high MSE . 47 5.8 Example of the emulator with low MSE . 48 5.9 Average MSE for number of data points . 49 5.10 Boxplot of MSE for number of data points . 49 5.11 Boxplot of MSE for number of data points (log) . 49 5.12 Histogram x fromη ^(x) = η(d)...................... 51 5.13 Normal qqplot of x fromη ^(x) = η(d).................. 51 5.14 PDF of x fromη ^(x) = η(d) over a Laplace PDF . 51 5.15 CDF of x fromη ^(x) = η(d) over a Laplace CDF . 52 5.16 Variance of d^ fromη ^(x) = η(d) for number of data points . 52 5.17 PDF ofη ^(0)................................ 53 5.18 CDF ofη ^(0)................................ 54 5.19 PDF ofη ^(0) with different type of sampling process . 54 d η^(0) 5.20 PDF of dx ................................ 55 d η^(0) 5.21 CDF of dx ................................ 55 5.22 One dimension with two discontinuities function . 57 5.23 PDF of the MSE, six data points, one dimension with two discontinuity 58 5.24 CDF of the MSE, six data points, one dimension with two discontinuity 58 5.25 PDF of x fromη ^(x) = η(d1) over a Laplace PDF . 59 5.26 PDF of x fromη ^(x) = η(d2) over a Laplace PDF . 59 5.27 Example of Jaccard index regions for one dimension with two discontinuities . 60 5.28 PDF of accuracy using Jaccard index . 61 5.29 CDF of accuracy using Jaccard index . 61 5.30 Average accuracy vs data points, using Jaccard index . 61 5.31 PDF ofη ^(d1) andη ^(d2)......................... 62 5.32 CDF ofη ^(d1) andη ^(d2)......................... 62 5.33 Simple example of a 2D region . 64 5.34 PDF of the MSE, two dimension, 50 datapoints . 65 5.35 CDF of the MSE, two dimension, 50 datapoints . 66 5.36 Average of the MSE vs number of data points . 66 5.37 Boxplot of the MSE vs number of data points . 67 V.

Modelling Step Discontinuous Functions Using Bayesian Emulation

Discontinuous Forcing Functions

Laplace Transforms: Theory, Problems, and Solutions

10 Heat Equation: Interpretation of the Solution

Delta Functions and Distributions

An Analytic Exact Form of the Unit Step Function

On the Approximation of the Step Function by Raised-Cosine and Laplace Cumulative Distribution Functions

On the Approximation of the Cut and Step Functions by Logistic and Gompertz Functions

Maximizing Acquisition Functions for Bayesian Optimization

Expanded Use of Discontinuity and Singularity Functions in Mechanics

Dirac Delta Function of Matrix Argument

A Heaviside Function Approximation for Neural Network Binary Classiﬁcation

Spectral Density Periodogram Spectral Density (Chapter 12) Outline