Canonical Correlation Analysis for Longitudinal Data
Total Page:16
File Type:pdf, Size:1020Kb
Old Dominion University ODU Digital Commons Mathematics & Statistics Theses & Dissertations Mathematics & Statistics Winter 2010 Canonical Correlation Analysis for Longitudinal Data Raymond McCollum Old Dominion University Follow this and additional works at: https://digitalcommons.odu.edu/mathstat_etds Part of the Applied Statistics Commons, and the Longitudinal Data Analysis and Time Series Commons Recommended Citation McCollum, Raymond. "Canonical Correlation Analysis for Longitudinal Data" (2010). Doctor of Philosophy (PhD), Dissertation, Mathematics & Statistics, Old Dominion University, DOI: 10.25777/sx07-de18 https://digitalcommons.odu.edu/mathstat_etds/38 This Dissertation is brought to you for free and open access by the Mathematics & Statistics at ODU Digital Commons. It has been accepted for inclusion in Mathematics & Statistics Theses & Dissertations by an authorized administrator of ODU Digital Commons. For more information, please contact [email protected]. CANONICAL CORRELATION ANALYSIS FOR LONGITUDINAL DATA by Raymond McCollum B. S. August 1996, Old Dominion University M.S. May 2000, Kansas State University A Dissertation Submitted to the Faculty of Old Dominion University in Partial Fulfillment of the Requirement for the Degree of DOCTOR OF PHILOSOPHY MATHEMATICS AND STATISTICS OLD DOMINION UNIVERSITY December 2010 Approved by: Dr. Rao Chaganty Dr. Norou Diawara Dr. Edward Markowski ABSTRACT CANONICAL CORRELATION ANALYSIS FOR LONGITUDINAL DATA Raymond McCollum Old Dominion University, 2010 Director: Dr. Dayanand Naik Data (multivariate data) on two sets of vectors commonly occur in applications. Sta- tistical analysis of these data is usually done using a canonical correlation analysis (CCA). Occurrence of these data at multiple occasions or conditions leads to longitu- dinal multivariate data for a CCA. We address the problem of canonical correlation analysis on longitudinal data when the data have a Kronecker product covariance structure. Using structured correlation matrices we model the dependency of re- peatedly observed data. Recent work of Srivastava, Nahtman, and von Rosen (2008) developed an iterative algorithm to determine the maximum likelihood estimate of the Kronecker product covariance structure for one set of variables. We implement and generalize their method to estimate the covariance parameters in the context of canonical correlation analysis. We implemented unstructured and autoregressive covariance structures for the repeated measures. However, the developed methods can be easily implemented for other covariance structure. Testing of hypothesis prob- lems using the likelihood ratio test statistics are explored. Bootstrap methods are adopted for calculating the p-values of the tests. Methods are illustrated on a data set obtained from NASA. Performance of the tests is explored using simulation ex- periments. Consequences of assuming the independence, between repeated measures and performing CCA at different time components, on the distribution of estimated canonical correlations is also explored. Certain simple tests to study the effect of repeated measures are provided here as well. ACKNOWLEDGEMENTS I am heartily thankful to my advisor Dr. Dayanand N. Naik for his support and guidance throughout the project. I would also like to thank my committee Dr. Rao Chaganty, Dr. Norou Diawara, and Dr. Edward Markowski for their time and expertise to better my work. My thanks also go to NASA Langley Research Center, Dr. Peter A. Parker for helping to find data relevant to the research topic and our many discussions about the process. I am grateful to Dr. Gao Chen for providing data and subject matter expert knowledge to help interpret and analyze the data from the Intercontinental Chemical Transport Experiment. Lastly I would like to acknowledge my family and express my appreciation for their continued support during the completion of the degree. I dedicate this thesis to my children Ciaran and Kevin McCollum who have lived their entire early childhood knowing their father as a Ph.D. student, and my wife Julie McCollum for her patience and understanding. IV ? TABLE OF CONTENTS Page List of Tables viii List of Figures ix CHAPTERS I Introduction 1 II Estimation of Covariance Matrices 7 11. 1 Canonical Correlation Analysis 7 11. 2 Repeated Canonical Correlation Analysis 9 11. 3 Estimation of Matrices in a Kronecker Product 11 11.4 Examining the SNV(2008) Solution 13 11. 5 Estimation of AR(I) Correlation Parameter 15 11. 6 Estimating for Repeated CCA 16 11. 7 Simulation Methodology 19 II. 7.1 Intuitive Estimation Results 21 11. 8 Use of Transformations to Estimate F^x Cg) S?? 27 II. 8.1 Transformation Simulation Results 30 III Hypothesis Testing in Repeated CCA 46 111.1 Five Hypothesis of Interest 46 III. 1.1 Variance Covariance Structures 47 111. 1.2 Variance Covariance Matrix II 48 111. 1.3 Variance Covariance Matrix III 49 III. 1.4 Variance Covariance Matrix IV 50 III. 1.5 Variance Covariance Matrix V 51 111. 2 Asymptotic Distribution in Testing 52 111. 3 Bootstrapping 57 111.4 NASA data 61 IV A Simplified Approach 71 IV. 1 Alternative Data Structure 71 IV. 2 Using the Unstructured Estimate 72 IV. 3 Reducing to the Time Covariance Function 73 IV. 4 Creating the Duplication Matrix 75 IV.4.1 Isolating Variables of Interest 83 IV.5 Hypothesis Test Going From I to II 84 IV. 5.1 Canonical Correlation Test Result 86 IV. 5. 2 Example Variance Covariance Matrix for t = 3 86 IV. 5. 3 Testing Specific Structures 89 vi IV. 5. 4 Multivariate Delta Theorem Application 92 APPENDICES A The Program 102 VITA 123 Vil LIST OF TABLES Page 1 Intuitive estimate convergence values from 1000 simulations 22 2 Maximum Likelihood Solution for E3,, Ex, px, py, intuitive estimate of Ex,, and F?3/, 500 Samples 22 3 Maximum Likelihood Solution for E3,, Ex, px, py, intuitive estimate of EXJ/ and F?3/, 350 samples 23 4 Maximum Likelihood Solution for E3,, Ex, px, py, intuitive estimate of EX3/ and F?3/, 200 samples 24 5 Maximum Likelihood Solution for E3,, Ex, px, py, intuitive estimate of EX3/ and F?3,, 100 samples 25 6 Maximum Likelihood Solution for E2,, Ex, px, py, intuitive estimate of Exy and F?3,, 50 samples 26 7 Maximum Likelihood Solution for E3,, Ex, px = .4, py = .2, transfor- mation estimate of pxy = .3 and E13,, 500 samples 31 8 Maximum Likelihood Solution for E3,, Ex, px = .4, P3, = .2, transfor- mation estimate of pxy = .3 and Exy, 350 samples 32 9 Maximum Likelihood Solution for E3,, Ex, px = .4, py = .2, transfor- mation estimate of pxy = .3 and E13,, 200 sample 33 10 Maximum Likelihood Solution for E3,, Ex, px = .4, P3, = .2, transfor- mation estimate of pxy = .3 and E13,, 100 samples 34 11 Maximum Likelihood Solution for E3,, Ex, px = .4, P3, = .2, transfor- mation estimate of F?3, = .3 and E13,, 50 samples 35 12 Maximum Likelihood Solution for E3,, Ex, px = .7, py = .5, transfor- mation estimate of F?3/ = .6, Ex,,, 500 samples 36 13 Maximum Likelihood Solution for E3,, Ex, px = .7, P3, = .5, transfor- mation estimate of pxy = .6, E13,, 350 samples 37 14 Maximum Likelihood Solution for E3,, Ex, px = .7, P3, = .5, transfor- mation estimate of pxy = .6, EX3/, 200 samples 38 15 Maximum Likelihood Solution for E3,, Ex, px = .7, P3, = .5, transfor- mation estimate of pxy = .6, E13,, 100 samples 39 16 Maximum Likelihood Solution for E3,, Ex, px = .7, P3, = .5, transfor- mation estimate of pX3/ = .6, Ex,,, 50 samples 40 17 Maximum Likelihood Solution for E3,, Ex, px = .9, py = .7, transfor- mation estimate of pxy = .8, E13,, 500 samples 41 18 Maximum Likelihood Solution for E3,, Ex, px = .9, py = .7, transfor- mation estimate of pxy = 8, Exy, 350 samples 42 19 Maximum Likelihood Solution for E3,, Ex, px = .9, py = .7, transfor- mation estimate of pxy = .8, EX3/, 200 samples 43 20 Maximum Likelihood Solution for E3,, Ex, px = .9, py — .7, transfor- mation estimate of pxy = .8, EX3/, 100 samples 44 21 Maximum Likelihood Solution for E3,, Ex, px = .9, py = .7, transfor- mation estimate of pxy = .8, EX3/, 50 samples 45 viii 22 Rejection rates for 1000 samples, Model II 53 23 Rejection rates for 1000 samples, Model III 55 24 Rejection rates for 1000 samples, Model IV 56 25 Rejection rates for 1000 samples, Model V 56 26 Rejection rates for 100 samples, Model II 59 27 Rejection rates for 100 samples, Model III 59 28 Rejection rates for 100 samples, Model IV 60 29 Rejection rates for 100 samples, Model V 60 30 Altitude and Molecule measurements before scaling 62 31 Testing Results for the NASA data 63 32 Covariance IV Estimates for NASA DC-8 S„ and C-130 Ex 64 33 Covariance IV Estimates for NASA DC-8 and C-130 cross covariance Tixy and correlation F 64 34 Altitude and Molecule measurements before scaling 64 35 Canonical Correlations Within Each Time Period 65 36 Raw Canonical Coefficients DC-8 65 37 Raw Canonical Coefficients C-130 66 38 Standardized Canonical Coefficients DC-8 66 39 Standardized Canonical Coefficients C-130 67 40 Correlation between DC-8 elements and their canonical Variables . 67 41 Correlation between C-130 elements and their canonical Variables . 68 42 Correlation between DC-8 elements and the Canonical Variables of the C-130 68 43 Correlation between C-130 elements and the Canonical Variables of the DC-8 68 44 Rejection Rates for Test Ai = A2 = . ¦ ¦ = Xt, Sample sizes 50 to 100 . 95 45 Rejection Rates for Test Ai = A2 = . = Xt, Sample sizes 150 to 200 . 96 46 Rejection Rates for Test A1 = A2 = . = Xt, Sample sizes 250 to 300 . 97 47 Rejection Rates for Test A1 = A2 = . = At, Sample sizes 350 to 400 . 98 48 Rejection Rates for Test A1 = A2 = . = Xt, Sample sizes 450 to 500 . 99 IX LIST OF FIGURES Page 1 INTEX-B Airtracks 69 2 INTEX-B Airtracks Altitude 70 1 CHAPTER I INTRODUCTION Studying the relationship between two sets of multivariate vectors is an important problem in statistics.