A Comparison of Methods for Generating Bivariate Non-Normally Distributed Random Variables
Total Page:16
File Type:pdf, Size:1020Kb
University of North Florida UNF Digital Commons UNF Graduate Theses and Dissertations Student Scholarship 2009 A Comparison of Methods for Generating Bivariate Non-normally Distributed Random Variables Jaimee E. Stewart University of North Florida Follow this and additional works at: https://digitalcommons.unf.edu/etd Part of the Mathematics Commons, and the Statistics and Probability Commons Suggested Citation Stewart, Jaimee E., "A Comparison of Methods for Generating Bivariate Non-normally Distributed Random Variables" (2009). UNF Graduate Theses and Dissertations. 235. https://digitalcommons.unf.edu/etd/235 This Master's Thesis is brought to you for free and open access by the Student Scholarship at UNF Digital Commons. It has been accepted for inclusion in UNF Graduate Theses and Dissertations by an authorized administrator of UNF Digital Commons. For more information, please contact Digital Projects. © 2009 All Rights Reserved A Comparison of Methods for Generating Bivariate Non-nonnally Distributed Random Variables By Jaimee E. Stewart A thesis submitted to the Department of Mathematics and Statistics in partial fulfillment of the requirements for the degree of Master of Science in Mathematical Sciences - Statistics UNIVERSITY OF NORTH FLORIDA COLLEGE OF ARTS AND SCIENCES June,2009 The thesis "A Comparison of Methods for Generating Bivariate, Non-normally Distributed Random Variables" submitted by Jaimee E. Stewart in partial fulfillment of the requirements for the degree of Master of Science in Statistics has been Approved by the thesis committee: Date Signature Deleted Dr. Ping Sa Thesis Advisor and Committe Signature Deleted Dr. Pali Sen Signature Deleted I/ Accepted for the Department: Signature Deleted Dr. Scott Hochwald Chairperson Accepted for the College: Signature Deleted Dean of the College Accepted for the University: Signature Deleted 7· Z71J4 Dean of the Graduate School 11 ACKNOWLEDGEMENT My deepest gratitude is given to Dr. Sa for this remarkable learning experience. Thank you for your patience, encouragement, and guidance. I could not have wished for this process to be any more rewarding. Thank you, Dr. Sen and Dr. Gleaton for serving on my thesis committee and for sharing with me your knowledge of statistics over the last few years. Thank you to my two boys, Grady and Arthur, who bring joy and laughter to my life on a daily basis. Finally, thank you to my husband, James. There is no way this would have been completed without your support and the constant supply of chocolate you kept for me in the freezer. 111 CONTENTS Abstract .......................................................................................................................... Vl Chapter 1: Introduction . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 1 Chapter 2: The Proposed Method ................................................................................... 9 2.1 The Fleishman Power Method ......................................................................... 9 2.2 The Fifth-Order Polynomial Transform Method ............................................ 11 2.3 The Generalized Lambda Distribution Method .............................................. 14 Chapter 3: Simulation Study .. .. .. .. .. .. .. .. .. .. ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 18 3.1 The Fleishman Power Method ....................................................................... 19 3.2 The Fifth-Order Polynomial Transform Method ............................................ 20 3. 3 The Generalized Lambda Distribution Method . .. .. .. .. .. .. .. .. .. .. .. .. 20 Chapter 4: Simulation Results ...................................................................................... 23 4.1 Comparison of Accuracy ................................................................................. 23 4.1.1 Accuracy as Correlation Changes ....................................................... 24 4.1.2 Accuracy as Skewness and Kurtoses Change ..................................... 25 4.1.3 Accuracy as Sample Size Changes ....................................................... 26 4.2 Comparison ofEase ofUse ............................................................................. 26 4.3 Comparison ofEfficiency .............................................................................. 28 Chapter 5: Conclusions ................................................................................................ 46 References .. .. .. .. .. .. .. .. .. .. .. .. .. ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 4 7 Appendix A ...................................................................................................................... 50 lV Appendix B....................................................................................................................... 51 Appendix C ....................................................................................................................... 52 Appendix D ...................................................................................................................... 66 Appendix E....................................................................................................................... 71 Appendix F ....................................................................................................................... 79 Appendix G ...................................................................................................................... 95 Appendix H . .. .. .. .. .. .. .. ... .. .. .. ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 102 Appendix I . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 112 v ABSTRACT Many distributions of multivariate data in the real world follow a non-normal model with distributions being skewed and/or heavy tailed. In studies in which multivariate non-normal distributions are needed, it is important for simulations ofthose variables to provide data that is close to the desired parameters while also being fast and easy to perform. Three algorithms for generating multivariate non-normal distributions are reviewed for accuracy, speed and simplicity. They are the Fleishman Power Method, the Fifth-Order Polynomial Transformation Method, and the Generalized Lambda Distribution Method. Simulations were run in order to compare the three methods by how well they generate bivariate distributions with the desired means, variances, skewness, kurtoses, and correlation, simplicity of the algorithms, and how quickly the desired distributions were calculated. Vl Chapter 1 Introduction Multivariate data consists of two or more random variables that are usually correlated, which are obtained as a result of an experiment. The most convenient and familiar multivariate distribution is the multivariate normal. Its popularity is likely due to its ease of simulation and ability to allow for closed-fotm theoretical results. Rarely though, do the distributions of data follow the symmetric multivariate nom1al model. In practice, most distributions are non-normal, with the data being skewed and/or heavy tailed. Examples of random variables with non-normal distributions would be waiting times, lengths of time between malfunctions in machinery, growth data such as bacterial growth, and proportional data. Monte Carlo simulations needing correlated normal and non-normal distributions have been used to investigate the small sample properties of competing statistics or the comparison of estimation techniques. Cases of this include the dependent sample t-test (Blair and Higgins, 1985), regression (Iman & Conover, 1979), and meta-analysis (Sawilowsky, Kelley, Blair, & Markman, 1994). The simulation of multivariate data is an integral component of data analysis methodologies. Simulation studies are used in computer evaluations to verify theoretical, large sample properties of statistical methods, estimators, and test statistics. The generation of multivariate random variables is also an important part of data analysis methodologies, such as bootstrap resampling and Markov Chain Monte Carlo. A good non-normal random number generator is able to produce data that will satisfy the requirements for certain parameters. It would be preferable that the generated random variables match all moments of the desired distribution, but matching the mean, the standard deviation, skewness, and kurtosis tends to produce adequate results. Skewness, 'Yh can be identified as the third standardized moment and is defined as: where m3 is the third moment about the mean of a random variable and cr is the standard deviation. It can also be described as the standardized cumulant, which is the ratio of the third cumulant K3 and the third power of the square root of the second cumulant K2: Kurtosis, y2, is the fourth standardized moment and is defined as: where m4 is the fourth moment about the mean of a random variable and cr is the standard deviation. It can also be defined as the standardized cumulant, which is the fourth cumulant divided by the square of the second cumulant: For two correlated random variables, the correlation is defined as: 2 Where fli and j12 are the means of the two random variables considered and J1 12 is the mean ofthe products of the two random variables. A procedure for developing non-normal univariate data was developed by