Multivariate Statistics Old School
Total Page:16
File Type:pdf, Size:1020Kb
Multivariate Statistics Old School Mathematical and methodological introduction to multivariate statistical analytics, including linear models, principal components, covariance structures, classification, and clustering, providing background for machine learning and big data study, with R John I. Marden Department of Statistics University of Illinois at Urbana-Champaign © 2015 by John I. Marden Email: [email protected] URL: http://stat.istics.net/Multivariate Typeset using the memoir package [Madsen and Wilson, 2015] with LATEX [LaTeX Project Team, 2015]. The faces in the cover image were created using the faces routine in the R package aplpack [Wolf, 2014]. Preface This text was developed over many years while teaching the graduate course in mul- tivariate analysis in the Department of Statistics, University of Illinois at Urbana- Champaign. Its goal is to teach the basic mathematical grounding that Ph. D. stu- dents need for future research, as well as cover the important multivariate techniques useful to statisticians in general. There is heavy emphasis on multivariate normal modeling and inference, both the- ory and implementation. Several chapters are devoted to developing linear models, including multivariate regression and analysis of variance, and especially the “both- sides models” (i.e., generalized multivariate analysis of variance models), which al- low modeling relationships among variables as well as individuals. Growth curve and repeated measure models are special cases. Inference on covariance matrices covers testing equality of several covariance ma- trices, testing independence and conditional independence of (blocks of) variables, factor analysis, and some symmetry models. Principal components is a useful graph- ical/exploratory technique, but also lends itself to some modeling. Classification and clustering are related areas. Both attempt to categorize indi- viduals. Classification tries to classify individuals based upon a previous sample of observed individuals and their categories. In clustering, there is no observed catego- rization, nor often even knowledge of how many categories there are. These must be estimated from the data. Other useful multivariate techniques include biplots, multidimensional scaling, and canonical correlations. The bulk of the results here are mathematically justified, but I have tried to arrange the material so that the reader can learn the basic concepts and techniques while plunging as much or as little as desired into the details of the proofs. Topic- and level-wise, this book is somewhere in the convex hull of the classic book by Anderson [2003] and the texts by Mardia, Kent, and Bibby [1979] and Johnson and Wichern [2007], probably closest in spirit to Mardia, Kent and Bibby. The material assumes the reader has had mathematics up through calculus and linear algebra, and statistics up through mathematical statistics, e.g., Hogg, McKean, and Craig [2012], and linear regression and analysis of variance, e.g., Weisberg [2013]. In a typical semester, I would cover Chapter 1 (introduction, some graphics, and principal components); go through Chapter 2 fairly quickly, as it is a review of mathe- matical statistics the students should know, but being sure to emphasize Section 2.3.1 on means and covariance matrices for vectors and matrices, and Section 2.5 on condi- iii iv Preface tional probabilities; go carefully through Chapter 3 on the multivariate normal, and Chapter 4 on setting up linear models, including the both-sides model; cover most of Chapter 5 on projections and least squares, though usually skipping 5.7.1 on the proofs of the QR and Cholesky decompositions; cover Chapters 6 and 7 on estimation and testing in the both-sides model; skip most of Chapter 8, which has many technical proofs, whose results are often referred to later; cover most of Chapter 9, but usu- ally skip the exact likelihood ratio test in a special case (Section 9.4.1), and Sections 9.5.2 and 9.5.3 with details about the Akaike information criterion; cover Chapters 10 (covariance models), 11 (classifications), and 12 (clustering) fairly thoroughly; and make selections from Chapter 13, which presents more on principal components, and introduces singular value decompositions, multidimensional scaling, and canonical correlations. A path through the book that emphasizes methodology over mathematical theory would concentrate on Chapters 1 (skip Section 1.8), 4, 6, 7 (skip Sections 7.2.5 and 7.5.2), 9 (skip Sections 9.3.4, 9.5.1. 9.5.2, and 9.5.3), 10 (skip Section 10.4), 11, 12 (skip Section 12.4), and 13 (skip Sections 13.1.5 and 13.1.6). The more data-oriented exercises come at the end of each chapter’s set of exercises. One feature of the text is a fairly rigorous presentation of the basics of linear al- gebra that are useful in statistics. Sections 1.4, 1.5, 1.6, and 1.8 and Exercises 1.9.1 through 1.9.13 cover idempotent matrices, orthogonal matrices, and the spectral de- composition theorem for symmetric matrices, including eigenvectors and eigenval- ues. Sections 3.1 and 3.3 and Exercises 3.7.6, 3.7.12, 3.7.16 through 3.7.20, and 3.7.24 cover positive and nonnegative definiteness, Kronecker products, and the Moore- Penrose inverse for symmetric matrices. Chapter 5 covers linear subspaces, linear independence, spans, bases, projections, least squares, Gram-Schmidt orthogonaliza- tion, orthogonal polynomials, and the QR and Cholesky decompositions. Section 13.1.3 and Exercise 13.4.3 look further at eigenvalues and eigenspaces, and Section 13.3 and Exercise 13.4.12 develop the singular value decomposition. Practically all the calculations and graphics in the examples are implemented using the statistical computing environment R [R Development Core Team, 2015]. Throughout the text we have scattered some of the actual R code we used. Many of the data sets and original R functions can be found in the R package msos [Marden and Balamuta, 2014], thanks to the much appreciated efforts of James Balamuta. For other material we refer to available R packages. I thank Michael Perlman for introducing me to multivariate analysis, and his friendship and mentorship throughout my career. Most of the ideas and approaches in this book got their start in the multivariate course I took from him forty years ago. I think they have aged well. Also, thanks to Steen Andersson, from whom I learned a lot, including the idea that one should define a model before trying to analyze it. This book is dedicated to Ann. Contents Preface iii Contents v 1 A First Look at Multivariate Data 1 1.1 Thedatamatrix................................. 1 1.1.1 Example:Planetsdata. 2 1.2 Glyphs...................................... 2 1.3 Scatterplots................................... 3 1.3.1 Example:Fisher-Andersonirisdata . 5 1.4 Sample means, variances, and covariances . ..... 6 1.5 Marginalsandlinearcombinations. .... 8 1.5.1 Rotations ................................ 10 1.6 Principalcomponents .. ......................... .. 10 1.6.1 Biplots.................................. 13 1.6.2 Example:Sportsdata . 13 1.7 Otherprojectionstopursue . 15 1.7.1 Example:Irisdata ......................... .. 17 1.8 Proofs....................................... 18 1.9 Exercises..................................... 20 2 Multivariate Distributions 27 2.1 Probabilitydistributions . 27 2.1.1 Distributionfunctions . 27 2.1.2 Densities ................................ 28 2.1.3 Representations ............................ 29 2.1.4 Conditionaldistributions . 30 2.2 Expectedvalues................................. 32 2.3 Means, variances, and covariances . 33 2.3.1 Vectorsandmatrices. 34 2.3.2 Momentgeneratingfunctions . 35 2.4 Independence .................................. 35 2.5 Additional properties of conditional distributions . ........... 37 2.6 Affinetransformations. 40 v vi Contents 2.7 Exercises..................................... 41 3 The Multivariate Normal Distribution 49 3.1 Definition .................................... 49 3.2 Somepropertiesofthe multivariatenormal. ...... 51 3.3 Multivariatenormaldatamatrix . 52 3.4 Conditioninginthemultivariatenormal . ..... 55 3.5 The sample covariance matrix: Wishart distribution . ......... 57 3.6 SomepropertiesoftheWishart . 59 3.7 Exercises..................................... 60 4 Linear Models on Both Sides 69 4.1 Linearregression ................................ 69 4.2 Multivariate regression and analysis of variance . ......... 72 4.2.1 Examplesofmultivariateregression. 72 4.3 Linearmodelsonbothsides . 77 4.3.1 Oneindividual............................. 77 4.3.2 IIDobservations ............................ 78 4.3.3 Theboth-sidesmodel . 81 4.4 Exercises..................................... 82 5 Linear Models: Least Squares and Projections 87 5.1 Linearsubspaces ................................ 87 5.2 Projections.................................... 89 5.3 Leastsquares .................................. 90 5.4 Bestlinearunbiasedestimators . 91 5.5 Leastsquaresintheboth-sidesmodel . 93 5.6 Whatisalinearmodel? ............................ 94 5.7 Gram-Schmidtorthogonalization. 95 5.7.1 TheQRandCholeskydecompositions . 97 5.7.2 Orthogonalpolynomials . 99 5.8 Exercises.....................................101 6 Both-Sides Models: Estimation 109 6.1 Distribution of β ................................109 6.2 Estimatingthecovariance . 109 6.2.1 Multivariateregressionb . 109 6.2.2 Both-sidesmodel. .111 6.3 Standard errors and t-statistics ...................... ..111 6.4 Examples.....................................112