Local Distance Correlation: an Extension of Local Gaussian Correlation

LOCAL DISTANCE CORRELATION: AN EXTENSION OF LOCAL GAUSSIAN CORRELATION Walaa Hamdi A Dissertation Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY August 2020 Committee: Maria Rizzo, Advisor Jari Willing, Graduate Faculty Representative Wei Ning Junfeng Shang Copyright c August 2020 Walaa Hamdi All rights reserved iii ABSTRACT Maria Rizzo, Advisor Distance correlation is a measure of the relationship between random vectors in arbitrary di- mension. A sample distance covariance can be formulated in both an unbiased estimator and a biased estimator of distance covariance, where distance correlation is defined as the normalized coefficient of distance covariance. The jackknife empirical likelihood for a U-statistic by Jing, Yuan, and Zhou (2009) can be applied to a distance correlation since the empirical likelihood method fails in nonlinear statistics. A Wilks’ theorem for jackknife empirical likelihood is shown to hold for distance correlation. This research shows how to construct a confidence interval for distance correlation based on jackknife empirical likelihood for a U-statistic, where the sample distance covariance can be represented as a U-statistic. In comparing coverage probabilities of confidence intervals for distance correlation based on jackknife empirical likelihood and bootstrap method, coverage probabilities for the jackknife empirical likelihood show more accuracy. We propose the estimation and the visualization of local distance correlation by using a local version of the jackknife empirical likelihood. The kernel density functional estimation is used to construct the jackknife empirical likelihood locally. The bandwidth selection for kernel function should minimize the distance between the true density and estimated density. Local distance correlation has the property that it equals zero in the neighborhood of each point if and only if the two variables are independent in that neighborhood. The estimation and visualization of local distance correlation are shown as accurate to capture the local dependence when compared with the local Gaussian correlation in simulation studies and real examples. iv My thanks to My Mom Nawal and Dad Ahmed; My husband Motaz, for his support; My daughters Rafif, Joanna, and Taleen v ACKNOWLEDGMENTS I would like to thank my advisor Dr. Rizzo for her precious advice and guidance to complete this dissertation. I appreciate her way of giving me different viewpoints and suggestions. I also appreciate her time to meet with me to discuss the results in more detail. I am honored that Dr. Rizzo is my dissertation advisor. I express my sincere thanks to committee members Dr. Junfeng Shang, Dr. Wei Ning, and Dr. Jari Willing, who supported me until I completed my degree. I also express my appreciation to all my professors in the Department of Mathematics and Statistics for their help and guidance. I am especially thankful to the Graduate Coordinator Dr. Craig Zirbel for his support of graduate students. I cannot express enough thanks to my husband Motaz and my parents for encouraging me to complete my degree with their best wishes. I will always remember my husband Motaz supporting me and always being by my side. I love my daughters and they always make me happy. I extend my gratitude to all of my family and my husband’s family who are directly or indirectly supporting me to complete my degree. Finally, I am glad to be a graduate student at Bowling Green State University. vi TABLE OF CONTENTS Page CHAPTER 1 INTRODUCTION . 1 CHAPTER 2 LITERATURE REVIEW . 5 2.1 Background on Dependence Coefficients . 5 2.2 Bivariate Dependence Measure . 5 2.3 Multivariate Dependence Measure . 8 2.4 Properties of Dependence Measure . 9 2.5 Distance Correlation . 12 2.6 Local Correlation . 13 2.7 Multiscale Graph Correlation . 15 CHAPTER 3 OVERVIEW OF DISTANCE CORRELATION . 17 3.1 Distance Correlation . 17 3.2 Modified Distance Correlation . 22 3.3 Unbiased Distance Correlation . 23 CHAPTER 4 CONFIDENCE INTERVAL FOR DISTANCE CORRELATION . 31 4.1 Confidence Intervals for Distance Correlation . 32 4.1.1 U-statistic Results . 32 4.1.2 Jackknife Empirical Likelihood for Distance Correlation . 39 4.2 Simulation Study . 47 4.3 Real Examples . 51 CHAPTER 5 LOCAL DISTANCE CORRELATION . 55 5.1 Local Gaussian Correlation . 55 5.1.1 Estimation of Local Gaussian Correlation . 56 5.1.2 Choice of Bandwidth for Kernel Function . 57 vii 5.1.3 Properties of Local Gaussian Correlation . 58 5.1.4 Global Gaussian Correlation . 59 5.2 Local Distance Correlation . 61 5.2.1 Estimation of Local Distance Correlation . 61 5.2.2 Choice of Bandwidth for Kernel Function . 67 5.2.3 Properties of Local Distance Correlation . 76 5.3 Simulation Study . 82 5.4 Real Examples . 91 5.4.1 Example 1: Aircraft . 91 5.4.2 Example 2: Wage . 92 5.4.3 Example 3: PRIM7 . 95 5.4.4 Example 4: Olive Oils . 97 CHAPTER 6 SUMMARY AND FUTURE WORK . 104 BIBLIOGRAPHY . 107 APPENDIX A SELECTED R PROGRAMS . 113 viii LIST OF FIGURES Figure Page 4.1 Scatterplot matrix of pairwise association of six fatty acids . 53 5.1 Contour plots of the true density and kernel estimate functions . 75 5.2 Scatter plot of X and Y ................................ 77 5.3 Illustration of exchange symmetry . 78 5.4 Illustration of reflection symmetry . 79 5.5 Illustration of radial symmetry . 80 5.6 Scatter plots when rotated for 90o and 180o ..................... 81 5.7 Illustration of rotation symmetry . 82 5.8 Scatter plots of different bivariate dependence structures . 83 5.9 Contour plots of different bivariate dependence structures . 84 5.10 The visualization of local Gaussian correlation and local distance correlation . 85 5.11 The visualization of local Gaussian correlation and local distance correlation . 86 5.12 The visualization of local Gaussian correlation and local distance correlation . 87 5.13 The visualization of local Gaussian correlation and local distance correlation . 88 5.14 The visualization of local Gaussian correlation and local distance correlation . 89 5.15 The visualization of local Gaussian correlation and local distance correlation . 90 5.16 Scatter and contour plots for aircraft dataset . 92 5.17 The visualization of local Gaussian correlation and local distance correlation for aircraft dataset . 93 5.18 Smooth scatter plot for Wage dataset . 94 5.19 The visualization of local Gaussian correlation and local distance correlation for Wage dataset . 95 5.20 Scatter and smooth scatter plots for PRIM7 dataset . 96 5.21 The visualization of local Gaussian correlation and local distance correlation for PRIM7 dataset . 97 ix 5.22 Smooth scatter plot for oleic and palmitoleic fatty acids . 98 5.23 The visualization of local Gaussian correlation and local distance correlation for oleic and palmitoleic fatty acids . 99 5.24 Smooth scatter plot for palmitic and steartic fatty acids . 100 5.25 The visualization of local Gaussian correlation and local distance correlation for palmitic and steartic fatty acids . 101 5.26 Smooth scatter plot for linoleic and linolenic fatty acids . 102 5.27 The visualization of local Gaussian correlation and local distance correlation for linoleic and linolenic fatty acids . 103 x LIST OF TABLES Table Page 4.1 Coverage probabilities and average interval lengths of 90% confidence interval for R2 ........................................... 50 4.2 Coverage probabilities and average interval lengths of 95% confidence interval for R2 ........................................... 50 4.3 Coverage probabilities and average interval lengths of 99% confidence interval for R2 ........................................... 51 4.4 Summary statistics of fatty acids . 52 4.5 The confidence intervals for bias-corrected distance correlation of bivariate variables of monounsaturated fats, saturated fats, and polyunsaturated fats . 54 1 CHAPTER 1 INTRODUCTION Correlation is a bivariate coefficient which measures the association or relationship between two random variables. The correlation coefficient is one of the interesting topics in statistics, be- cause statisticians have been developing different ways to quantify the relationship between variables and properties of dependence measure. We can find a point estimate and calculate confidence intervals to estimate the population correlation. Point estimation is used to calculate a single value for estimating the population correlation coefficient and the confidence intervals are defined as a range of values that contains the population correlation coefficient. Moreover, a hypothesis test for the population correlation coefficient is used to evaluate two mutually exclusive statements about a population from sample data. Pearson correlation is the most commonly used method to study the relationship between two random variables, but it fails to capture nonlinear dependence. For non-Gaussian random variables, the correlation coefficient value can be close to zero, even if the variables are dependent. Szekely,´ Rizzo, and Bakirov (2007) introduced distance correlation, a nonparametric approach, which is a new measure of testing multivariate dependence between random vectors. Distance correlation is analogous to the product-moment Pearson correlation coefficient, but distance correlation is able to detect linear and nonlinear dependence structure. The distance correlation is defined by normalized coefficient of distance covariance, where the sample distance covariance has both a U-statistic and V -statistic representation. One goal

Load more