High Dimensional Data Analysis Via the SIR/PHD Approach
Total Page:16
File Type:pdf, Size:1020Kb
High dimensional data analysis via the SIR/PHD approach April 6, 2000 Preface Dimensionality is an issue that can arise in every scientific field. Generally speaking, the difficulty lies on how to visualize a high dimensional function or data set. This is an area which has become increasingly more important due to the advent of computer and graphics technology. People often ask : “How do they look?”, “What structures are there?”, “What model should be used?” Aside from the differences that underly the various scientific con- texts, such kind of questions do have a common root in Statistics. This should be the driving force for the study of high dimensional data analysis. Sliced inverse regression(SIR) and principal Hessian direction(PHD) are two basic di- mension reduction methods. They are useful for the extraction of geometric information underlying noisy data of several dimensions - a crucial step in empirical model building which has been overlooked in the literature. In this Lecture Notes, I will review the theory of SIR/PHD and describe some ongoing research in various application areas. There are two parts. The first part is based on materials that have already appeared in the literature. The second part is just a collection of some manuscripts which are not yet published. They are included here for completeness. Needless to say, there are many other high dimensional data analysis techniques (and we may encounter some of them later on) that should deserve more detailed treatment. In this complex field, it would not be wise to anticipate the existence of any single tool that can outperform all others in every practical situation. Real world problems generally require a number of passes to the same data. Different approaches often lead to different structural findings at various stages. Serious readers should try out as many methods as possible on their own. I started writing a preliminary version of this Lecture Notes in 1991-1992 when I had the chance to teach a seminar course at UCLA on High Dimensional Data Analysis. At that time, SIR/PHD has just begun to appear in official journals. This makes the writing of a book very difficult because most works are yet to be published. Even though such materials have later been used in similar courses and workshops, I hardly have the mood to rewrite it. The real opportunity finally came last year when the colleagues at Institute of Statistical Science, Academia Sinica, initiated the idea of this Lecture Series. I figured that I would have more to write now because there have many new exciting developments along the line. Most noteworthy are the books of Cook(1998), and Cook and Weisberg (1994). I admire the way they presented the ideas, which are not far from I really want to say. It is such a remarkable achievement for them to find a lucid language in dealing with the difficult subject of how to think of graphics in rigorous statistical terms. As expected, with the new language, they have generated many new ideas and useful techniques that go beyond SIR/PHD. For this book, I am still using the words as I originally thought about the subject of dimension reduction. The basic material is narrowly focused on the development in which I am directly involved. Thus there is no serious attempt to be comprehensive in surveying the whole literature on SIR/PHD. For many researchers, SIR/PHD is still a novel technique and new results are still waiting to be published. I would like to thank a lot people who one way or the other have helped me in the i development of SIR/PHD, including all my co-authors, colleagues, students, friends, and many anonymous referees. I would also like to thank Dr. Chen-Hsing Chen who is in charge of the Lecture Series. My writing of this Lecture Notes would have further been delayed without the persistent request from him. I really appreciate his patience. Finally, many of the computer and graphical outputs are put together by Dr. Chun-Houh Chen, who has been working with me over years. Without his devotion to critical programming works and his many good ideas in implementing SIR/PHD, the progress would be much slowed. Acknowledgment: Over years, the research of Li has been supported in part by NSF grants. ii Contents I SIR/PHD - THEORY AND PRACTICE ix 1 A Model for Dimension Reduction in Regression 1 1.1 Static and dynamic graphics. ...................... 1 1.1.1 Graphical tools. .......................... 1 1.1.2 Boston housing data. ...................... 3 1.2 A regression paradigm. .......................... 5 1.3 Principal component analysis. ...................... 7 1.4 Effective dimension reduction in regression. ................. 7 1.4.1 The model. .............................. 8 1.4.2 Special cases. .......................... 9 1.4.3 The e.d.r. directions. ...................... 10 1.4.4 The rationale. .......................... 10 1.4.5 An equivalent version. ...................... 11 1.4.6 Discrepancy measure. ...................... 12 2 Sliced Inverse Regression: Basics 13 2.1 Forward and inverse regression. ...................... 13 2.2 An algorithm of SIR. .......................... 14 2.3 SIR and principal component analysis. .................. 16 2.4 Some simulation examples. .......................... 17 2.5 Contour plotting and SIR. .......................... 20 2.6 Fisher consistency for SIR. .......................... 21 2.7 Proof of Theorem 2.1 . .......................... 24 3 Sampling Properties of SIR 25 3.1 Consistency of SIR. .............................. 25 3.1.1 The root n rate. .......................... 25 3.1.2 The descripency measure. ...................... 26 3.1.3 Simulation. .............................. 26 3.2 Eigenvalues. ............................... 28 3.2.1 Chi-squared test. .......................... 29 3.2.2 Eigenvalues and the assessment of K ................. 29 iii 4 Applying Sliced Inverse Regression 31 4.1 Worsted yarn. ................................. 31 4.2 Variable selection. ............................. 32 4.3 Boston housing data. ............................. 33 4.3.1 Crime rate. ............................. 34 4.3.2 The low crime rate group. ..................... 34 4.3.3 Intrepretation. ............................. 34 4.4 Structure removal. ............................. 36 4.5 OTL push-pull circuit. ......................... 37 5 Generalization of SIR : Second Moment Based Methods 41 5.1 A simple symmetric response curve. ..................... 41 5.2 Slice covariances. ............................. 42 5.3 Basic properties of slice covariances. ..................... 43 5.4 An iterative procedure. ......................... 44 5.5 SIR II algorithm. ............................. 45 6 Transformation and SIR 47 6.1 Dependent variable transformation. ..................... 47 6.2 Some Remarks. ............................. 51 6.3 Examples. .................................. 52 6.3.1 Curves and clusters. ......................... 52 6.3.2 Heteroscedasticity. ......................... 54 6.3.3 Horseshoe and helix. ......................... 56 6.4 Simple estimates for the standard deviations of the SIR directions. .................................. 58 7 Principal Hessian Directions 61 7.1 Principal Hessian directions. ......................... 61 7.2 Dimension reduction. ............................. 62 7.3 Stein’s lemma and estimates of the PHD’s. ................ 62 7.3.1 Stein’s lemma. ......................... 62 7.3.2 Estimates for principal Hessian directions. ............ 64 7.4 Sampling properties for normal carriers. ................ 65 7.5 Linear conditional expectation for x...................... 65 7.6 Extension. .................................. 67 7.7 Examples. .................................. 68 8 Linear Design Condition 73 8.1 Forcing elliptic symmetry. ......................... 73 8.1.1 A simple case: a 2-D square. ..................... 73 8.1.2 Brillinger’s normal resampling. ................ 74 8.1.3 Minimum volume ellipsoid and Voronoi tesselation. ....... 74 8.2 Higher dimension. ............................. 75 iv 8.2.1 Effectiveness of MVE. ...................... 75 8.2.2 Difference between conditional linearity and the elliptic contoured distribution. .............................. 77 8.2.3 A simulation study. ...................... 78 8.2.4 Most low dimension projections are nearly linear. ......... 79 8.3 Implication and some guidance. ...................... 79 8.3.1 Blind application. .......................... 79 8.3.2 Diagnostic checking. ...................... 80 8.3.3 The most dangerous directions. .................. 80 8.3.4 Bias bound. .............................. 80 9 Incorporating Discrete Input Variables 81 9.1 Stratification. ............................... 81 9.2 Pooling estimates from different strata. .................. 82 9.3 Estimation of treatment effects. ...................... 83 10 Quasi-Helices in High Dimensional Regression 89 10.1 Quasi-helical confounding. .......................... 89 10.2 The κ measure of nonlinearity. ...................... 92 10.3 Searching for quasi-helices. .......................... 94 10.4 Sensitivity of geometric shape change. .................. 97 10.5 Over-linearization in linear approximation. .................. 98 10.6 Over-fit in nonlinear approximation. .................. 102 10.7 Model uncertainty and information loss . .................. 102 10.7.1 Least favorable submodel. ...................... 102 10.7.2 Information loss for nearly linear regression. ......... 104 10.8 Hypothesis testing for nearly linear regression. ............