Projected Multivariate Linear Models for Directional Data
Total Page:16
File Type:pdf, Size:1020Kb
PROJECTED MULTIVARIATE LINEAR MODELS FOR DIRECTIONAL DATA By PAVLINA RUMCHEVA A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2005 Copyright 2005 by Pavlina Rumcheva To my mother and father, Ludmila and Ignat ACKNOWLEDGMENTS I would like to thank Dr. Brett Presnell for giving me the chance to work with him on this very interesting topic. His guidance and suggestions helped me a lot in my research. I also wish to thank him for being my graduate advisor for all the five years at the University of Florida. I would also like to thank Ramon Littell, Wendy London, Alex Trindade, and Michael Perfit for serving on my committee and taking the time to read my dissertation. Particularly, I would like to thank Dr. Littell for his helpful suggestions related to this topic, and also Wendy, who has been my supervisor at the Children’s Oncology Group for the last three years, for her willingness to answer questions, encouragement, and understanding. I thank Dobrin for all his support throughout my studies. His love and care have been an invaluable part of my life. I would like to thank my parents, Ludmila and Ignat, for stimulating my desire to study and helping me find my own path in life. Finally, I thank my little sister, Irina, for her love and belief in me. IV TABLE OF CONTENTS page ACKNOWLEDGMENTS iv LIST OF TABLES vii LIST OF FIGURES ix ABSTRACT xiii CHAPTER 1 INTRODUCTION 1 2 PROJECTED NORMAL DISTRIBUTION 6 2.1 Definition 6 2.2 The Mean Resultant Length of the Projected Normal Distribution 7 2.3 Comparison with the Fisher Distribution 18 2.4 Conditional Distribution of the Radial Part Given the Direction . 20 3 SPML REGRESSION MODEL 22 3.1 The Model 22 3.2 Computation of Maximum Likelihood Estimates 24 4 MULTI-SAMPLE TESTS OF HYPOTHESES 29 4.1 Model Specification 29 4.2 A Test for Equal Mean Directions Assuming Same Concentration 30 4.3 A Test for Equal Concentrations 34 4.4 Approximating the Distribution of the LRT Statistics for the von Mises-Fisher Distribution with High Concentration 34 4.5 An Example 39 4.6 Simulation Studies 42 5 ONE-WAY RANDOM EFFECTS MODEL 47 5.1 Model Specification 47 5.2 Maximum Likelihood Estimation 51 5.2.1 Maximum Likelihood Estimation Using Gauss-Hermite Quadrature 51 5.2.2 Markov Chain Monte Carlo EM Algorithm 53 5.2.3 Monte Carlo EM Algorithm 58 v 5.3 Alternative Estimators of n and 60 5.4 An Example 61 6 MIXED EFFECTS MODELS 66 6.1 Random Intercept Model 66 6.2 General Mixed Effects Model 70 7 SUMMARY AND FUTURE WORK 74 APPENDICES A REPEATED INTEGRALS OF 4> 76 B POSITIVE DEFINITENESS OF THE NEGATIVE HESSIAN 79 C MAXIMUM LIKELIHOOD CALCULATIONS FOR THE MULTI-SAMPLE PROBLEM WITH EQUAL CONCENTRATIONS 81 D HUMAN, GORILLA, AND CHIMPANZEE SUPERIOR FACET DATA 86 E SIMULATION RESULTS 89 REFERENCES 112 BIOGRAPHICAL SKETCH 114 vi LIST OF TABLES Table page 1-1 The roulette data 3 4-1 Descriptive statistics and p-values from Test 1 and Test 2. MD=mean direction; MRL=mean resultant length; Conc=concentration for the projected normal distribution 40 4-2 Pairwise comparisons: p-values from Test 1 and Test 2 42 D-l Directions of Superior Facet Major Axis of 19 Humans 86 D-2 Directions of Superior Facet Major Axis of 16 Gorillas 87 D-3 Directions of Superior Facet Major Axis of 18 Chimpanzees 88 E-l Proportion of times the Test 1 and LRMF test statistics exceeded the 0.05 upper quantile of the F distribution when testing two 2: 2(n-2 ) groups of sample sizes 10 and 10, in three dimensions; 5000 simulated values of the test statistic were used. Separ = angular separation in degrees of the true mean vectors 109 E-2 Proportion of times the Test 1 and LRMF test statistics exceeded the 0.05 upper quantile of the F - distribution when testing two 2i 2 ( n 2 ) groups of sample sizes 10 and 20, in three dimensions; 5000 simulated values of the test statistic were used. Separ = angular separation in degrees of the true mean vectors 109 E-3 Proportion of times the Test 1 and LRMF test statistics exceeded the 0.05 upper quantile of the F distribution when testing two 2 , 2(n- 2 ) groups of sample sizes 20 and 20, in three dimensions; 5000 simulated values of the test statistic were used. Separ = angular separation in degrees of the true mean vectors 110 E-4 Proportion of times the Test 1 and LRMF test statistics exceeded the 0.05 upper quantile of the F - distribution when testing two 2i2 ( n 2 ) groups of sample sizes 20 and 40, in three dimensions; 5000 simulated values of the test statistic were used. Separ = angular separation in degrees of the true mean vectors 110 vii E 5 Proportion of times the Test 1 and LRMF test statistics exceeded the 0.05 upper quantile of the F - distribution when testing two 2 i 2 ( n 2 ) groups of sample sizes 40 and 40, in three dimensions; 5000 simulated values of the test statistic were used. Separ = angular separation in degrees of the true mean vectors Ill % 1- 2- LIST OF FIGURES Figure page 1 The sample mean U, the mean direction Uq and the mean resultant length R for the roulette data 3 1 Left: the projected normal density (solid curve) and the Fisher density (dashed curve) for p=0. 25, 0.5, 0.75, 0.9, 0.95. Right: Mean resultant 4- lengths of the projected normal distribution (solid curve) and Fisher distribution (dashed curve) 21 4-15- A human vertebra 40 4-2 Humans’ major axis superior facet directions plotted using an equal- area projection 40 4-3 Gorillas’ major axis superior facet directions plotted using an equal- area projection 41 4 Chimpanzees’ major axis superior facet directions plotted using an equal-area projection (excludes observation number 6 since it lies on the lower hemisphere) 41 1 The left and right plots show the convergence of the estimates for the first and second elements of /r, respectively, using the MCMCEM algorithm, with the last 500 values of the Gibbs chain 63 5-2 Convergence of the estimate for using the MCMCEM algorithm, with the last 500 values of the Gibbs chain 63 5-3 The left and right plots show the convergence of the estimates for the first and second elements of ft. respectively, using the MCMCEM algorithm, with the last 2500 values of the Gibbs chain 64 5-4 Convergence of the estimate for a using the MCMCEM algorithm, with the last 2500 values of the Gibbs chain 64 5-5 The left and right figures plot the estimates for the first and second elements of /r, respectively, found by directly maximizing the likelihood using Gauss-Hermite quadrature 65 5-6 The estimates for found by directly maximizing the likelihood using Gauss-Hermite quadrature 65 IX E-l P-values from Test 1 (y-axis) versus uniform quantiles (x-axis); sample sizes (10,10) 89 E-2 P-values from the corrected Test 1 (y-axis) versus uniform quantiles (x-axis); sample sizes (10,10) 89 E 3 P-values from Test 1 (y-axis) versus uniform quantiles (x-axis); sample sizes (10,20) 90 E -4 P-values from the corrected Test 1 (y-axis) versus uniform quantiles (x-axis); sample sizes (10,20) 90 E 5 P-values from Test (y-axis) versus uniform quantiles (x-axis) sample 1 ; sizes (20,20) 91 E-6 P-values from the corrected Test 1 (y-axis) versus uniform quantiles (x-axis); sample sizes (20,20) 91 E-7 P-values from Test 1 (y-axis) versus uniform quantiles (x-axis); sample sizes (20,40) 92 E-8 P-values from the corrected Test 1 (y-axis) versus uniform quantiles (x-axis); sample sizes (20,40) 92 E 9 P-values from Test (y-axis) versus uniform quantiles (x-axis) sample 1 ; sizes (40,40) 93 E-10 P-values from the corrected Test 1 (y-axis) versus uniform quantiles (x-axis); sample sizes (40,40) 93 E-ll P-values from Test 2 (y-axis) versus uniform quantiles (x-axis); sample sizes (10,10) 94 E-12 P-values from the corrected Test 2 (y-axis) versus uniform quantiles (x-axis); sample sizes (10,10) 94 E-13 P-values from Test 2 (y-axis) versus uniform quantiles (x-axis); sample sizes (10,20) 95 E-l 4 P-values from the corrected Test 2 (y-axis) versus uniform quantiles (x-axis); sample sizes (10,20) 95 E-15 P-values from Test 2 (y-axis) versus uniform quantiles (x-axis); sample sizes (20,20) 96 E-16 P-values from the corrected Test 2 (y-axis) versus uniform quantiles (x-axis); sample sizes (20,20) 96 E-17P-values from Test 2 (y-axis) versus uniform quantiles (x-axis); sample sizes (20,40) 97 x E-18P-values from the corrected Test 2 (y-axis) versus uniform quantiles (x-axis); sample sizes (20,40) 97 E-19P-values from Test 2 (y-axis) versus uniform quantiles (x-axis); sample sizes (40,40) 98 E-20P-values from the corrected Test 2 (y-axis) versus uniform quantiles (x-axis); sample sizes (40,40) 98 E 21 P-values from Test 3 (y-axis) versus uniform quantiles (x-axis); sample sizes (10,10) 99 E-22P-values from the corrected Test 3 (y-axis) versus uniform quantiles (x-axis); sample sizes (10,10) 99 E-23P-values from Test 3 (y-axis) versus uniform quantiles (x-axis); sample sizes (10,20) 100 E-24P-values from the corrected Test 3 (y-axis) versus uniform quantiles (x-axis); sample sizes (10,20) 100 E-25P-values from Test 3 (y-axis) versus uniform quantiles (x-axis); sample sizes (20,20) 101 E-26P-values from the corrected Test 3 (y-axis) versus uniform quantiles (x-axis); sample sizes (20,20) 101 E-27P-values from Test 3 (y-axis) versus uniform quantiles (x-axis); sample sizes (20,40) 102 E 28 P- values from the corrected Test 3 (y-axis) versus uniform quantiles (x-axis); sample sizes (20,40) 102 E-29P-values from Test 3 (y-axis) versus uniform quantiles (x-axis); sample sizes (40,40) 103 E-30P-values