Design and Analysis of Computer Experiments for Screening Input Variables Dissertation Presented in Partial Fulfillment of the R
Total Page:16
File Type:pdf, Size:1020Kb
Design and Analysis of Computer Experiments for Screening Input Variables Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Hyejung Moon, M.S. Graduate Program in Statistics The Ohio State University 2010 Dissertation Committee: Thomas J. Santner, Co-Adviser Angela M. Dean, Co-Adviser William I. Notz ⃝c Copyright by Hyejung Moon 2010 ABSTRACT A computer model is a computer code that implements a mathematical model of a physical process. A computer code is often complicated and can involve a large number of inputs, so it may take hours or days to produce a single response. Screening to determine the most active inputs is critical for reducing the number of future code runs required to understand the detailed input-output relationship, since the computer model is typically complex and the exact functional form of the input- output relationship is unknown. This dissertation proposes a new screening method that identifies active inputs in a computer experiment setting. It describes a Bayesian computation of sensitivity indices as screening measures. It provides algorithms for generating desirable designs for successful screening. The proposed screening method is called GSinCE (Group Screening in Computer Experiments). The GSinCE procedure is based on a two-stage group screening ap- proach, in which groups of inputs are investigated in the first stage and then inputs within only those groups identified as active at the first stage are investigated indi- vidually at the second stage. Two-stage designs with desirable properties are con- structed to implement the procedure. Sensitivity indices are used to measure the effects of inputs on the response. Inputs with large sensitivity indices are determined by comparison with a benchmark null distribution constructed from user-specified, low-impact inputs. The use of low-impact inputs is useful for screening out inputs ii having small effects as well as those that are totally inert. Simulated examples show that, compared with one-stage procedures, the GSinCE procedure provides accurate screening while reducing computational effort. In this dissertation, the sensitivity indices used as screening measures are com- puted in a Gaussian process model framework. This approach is known to be compu- tationally efficient by using small numbers of expensive computer code runs for the estimation of sensitivity indices. The existing approach for quantitative inputs is ex- tended so that sensitivity indices can be computed when inputs include a qualitative input in addition to quantitative inputs. An orthogonal design in which the design matrix has uncorrelated columns is important for estimating the effects of inputs. Moreover, a space-filling design for which design points are well spread out is needed to explore the experimental re- gion thoroughly. New algorithms for achieving such orthogonal space-filling designs are proposed in this dissertation. The three kinds of software are provided for the proposed GSinCE procedure, computation of sensitivity indices, and design search algorithms. iii This is dedicated to my daughter Moonyoung, son Nathan, husband Jungick, and parents. iv ACKNOWLEDGMENTS I would first like to express my gratitude to my co-advisors, Professor Thomas Santner and Professor Angela Dean. They have given me tremendous help in my professional development and great guidance in my life. They are very special teach- ers and mentors to me. I am truly grateful for the effort that they have put into my education and the time that they have shared with me. I would also like to thank Professor William Notz for helpful comments and support as a member of my dissertation committee. I want to give special thanks to my parents for their love and support. Without their help and sacrifices, my husband Jungick and I could not finish Ph.D. study at the same time. I would also like to thank Jungick for his love and for every moment that we have shared during our Ph.D. study. I am most thankful to my precious little ones, daughter Moonyoung and son Nathan. They have given me all the happiness, hope, and strength to do my best in my life. v VITA October 1977 . Korea 2000 . .B.S. Statistics, Korea University 2000 to 2004 . .Statistician, The Bank of Korea 2006 . .M.S. Statistics, The Ohio State University 2005 to present . Graduate Research Associate, Graduate Teaching Associate, The Ohio State University FIELDS OF STUDY Major Field: Statistics vi TABLE OF CONTENTS Page Abstract . ii Dedication . iv Acknowledgments . v Vita . vi List of Tables . x List of Figures . xiv Chapters: 1. Introduction . 1 1.1 Computer Experiments . 1 1.2 Gaussian Stochastic Process Model . 2 1.3 Screening Procedure . 5 1.3.1 Screening in Computer Experiments . 5 1.3.2 Group Screening in Physical Experiments . 6 1.4 Design of Computer Experiments . 7 1.5 Overview of Dissertation . 9 2. Two-stage Sensitivity-based Group Screening in Computer Experiments . 10 2.1 Introduction . 10 2.1.1 Background . 10 2.1.2 Overview of the Proposed Procedure . 13 2.2 GSinCE Initialization Stage . 14 2.3 GSinCE Procedure Stage 1 . 16 vii 2.3.1 Stage 1 Sampling Phase . 16 2.3.2 Stage 1 Grouping Phase . 17 2.3.3 Stage 1 Analysis Phase . 19 2.4 GSinCE Procedure Stage 2 . 24 2.4.1 Stage 2 Sampling Phase . 24 2.4.2 Stage 2 Analysis Phase . 25 3. Performance of GSinCE . 26 3.1 Simulation Studies to Set τ ...................... 26 3.1.1 Simulations for f = 20 . 29 3.1.2 Simulations for f = 30 . 33 3.1.3 Simulations for f = 10 . 35 3.1.4 Summary of Simulation Studies . 43 3.2 Application of GSinCE in Least Favorable Cases . 43 3.2.1 Small Percentage of Active Inputs . 44 3.2.2 Non-linear Functions . 45 3.2.3 Detecting Large Effects . 50 3.3 Properties of Two-stage Designs . 51 3.3.1 Augmented Design . 52 3.3.2 Combined Design at Stage 2 . 54 4. Application of GSinCE . 57 4.1 Examples from the Literature . 57 4.1.1 Borehole Model . 58 4.1.2 A Model for the Weight of an Aircraft Wing . 60 4.1.3 OTL Circuit Model . 62 4.1.4 Piston Simulator Model . 64 4.1.5 Summary . 65 4.2 A Real Computer Experiment: FRAPCON Model . 66 4.2.1 Description of Code . 66 4.2.2 Use of GSinCE . 67 4.2.3 Implementations . 70 5. Computation of Sensitivity Indices . 81 5.1 Sensitivity Indices of Quantitative Inputs . 81 5.1.1 Definition of Sensitivity Indices . 82 5.1.2 Estimation in Gaussian Process Framework . 87 5.1.3 The Integrals: sgint, dbint, mxint . 94 5.1.4 Example . 100 viii 5.2 Sensitivity Indices of Mixed Inputs . 103 5.2.1 Setup . 104 5.2.2 Correlation Function for Mixed Inputs . 105 5.2.3 Estimation of Sensitivity Indices for Mixed Inputs . 107 5.2.4 Example . 115 6. Algorithms for Generating Maximin Latin Hypercube and Orthogonal De- signs . 118 6.1 Introduction . 118 6.2 Maximin Criteria for Space-filling Designs . 121 6.3 Algorithms for Space-filling Latin Hypercube Designs . 123 6.3.1 Complete Search and Random Generation . 123 6.3.2 Random Swap Methods for Maximin LHDs . 124 6.3.3 A Smart Swap Method for Maximin LHDs . 125 6.4 Algorithms for Orthogonal Maximin Designs . 127 6.4.1 Orthogonal Maximin LHDs . 127 6.4.2 Orthogonal Maximin Gram-Schmidt Designs . 129 6.5 Comparisons . 133 6.5.1 Maximin LHDs . 133 6.5.2 Orthogonal Maximin Designs . 135 6.6 Summary . 139 7. Alternative Two-stage Designs . 141 7.1 Orthogonal Array-based Latin Hypercube Design . 141 7.2 Stage 1 Design for a Two-stage Group Screening Procedure . 143 7.2.1 Construction . 143 7.2.2 Secondary Criteria . 148 7.3 Stage 2 Design for a Two-stage Group Screening Procedure . 149 7.4 Limitations . 150 7.4.1 Availability of OA-based LHD . 150 7.4.2 Group Variable Defined by Averaging . 150 8. Software . 155 8.1 GSinCE Code . 155 8.2 Sensitivity Code . 158 8.3 Maximin Code . 164 Bibliograhpy . 168 ix LIST OF TABLES Table Page 3.1 Marginal probabilities and coefficient distributions for the simulation study . 29 3.2 Six combinations used to recommend τ . 30 3.3 Median and IQR values of the performance measures, and average num- ber of groups and average total runs over 200 test functions with about 25% of active inputs among f = 20 inputs for each τ in each combi- nation; value in parentheses is the number of test functions generated with no active inputs . 32 3.4 Modified values of qL and q×|NN . Other probabilities are as in Table 3.1 to achieve about 25% of f = 30 inputs active . 34 3.5 Median values of the performance measures, and median/average val- ues of true/claimed active inputs over 50 test functions with about 25% of active inputs among f = 30 inputs; value in parentheses is the number of test functions generated with no active inputs . 34 3.6 Modified values of qL to achieve about 25%, and 35% of f = 10 inputs active, while keeping other probabilities as in Table 3.1 . 36 3.7 Median values of the performance measures, and median/average val- ues of true/claimed active inputs over 100 test functions with about 25% of active inputs among f = 10 inputs; value in parentheses is the number of test functions generated with no active inputs . 38 3.8 Median values of the performance measures, and median/average val- ues of true/claimed active inputs over 100 test functions with.