Understanding Statistical Methods Through Elliptical Geometry
Total Page:16
File Type:pdf, Size:1020Kb
Statistical Science 2013, Vol. 28, No. 1, 1–39 DOI: 10.1214/12-STS402 c Institute of Mathematical Statistics, 2013 Elliptical Insights: Understanding Statistical Methods through Elliptical Geometry Michael Friendly, Georges Monette and John Fox Abstract. Visual insights into a wide variety of statistical methods, for both didactic and data analytic purposes, can often be achieved through geomet- ric diagrams and geometrically based statistical graphs. This paper extols and illustrates the virtues of the ellipse and her higher-dimensional cousins for both these purposes in a variety of contexts, including linear models, mul- tivariate linear models and mixed-effect models. We emphasize the strong relationships among statistical methods, matrix-algebraic solutions and ge- ometry that can often be easily understood in terms of ellipses. Key words and phrases: Added-variable plots, Bayesian estimation, con- centration ellipse, data ellipse, discriminant analysis, Francis Galton, hy- pothesis-error plots, kissing ellipsoids, measurement error, mixed-effect mod- els, multivariate meta-analysis, regression paradoxes, ridge regression, sta- tistical geometry. 1. INTRODUCTION In the beginning, there was an ellipse. As mod- Whatever relates to extent and quantity ern statistical methods progressed from bivariate to may be represented by geometrical figures. multivariate, the ellipse escaped the plane to a 3D Statistical projections which speak to the ellipsoid, and then grew onward to higher dimen- senses without fatiguing the mind, possess sions. This paper extols and illustrates the virtues the advantage of fixing the attention on a of the ellipse and her higher-dimensional cousins for great number of important facts. both didactic and data analytic purposes. When Francis Galton (1886) first studied the re- Alexander von Humboldt [(1811), lationship between heritable traits of parents and page ciii] their offspring, he had a remarkable visual insight— arXiv:1302.4881v1 [stat.ME] 20 Feb 2013 contours of equal bivariate frequencies in the joint Michael Friendly is Professor, Psychology Department, distribution seemed to form concentric shapes whose York University, 4700 Keele St, Toronto, Ontario, M3J outlines were, to Galton, tolerably close to concen- 1P3, Canada e-mail: [email protected]. Georges tric ellipses differing only in scale. Monette is Associate Professor, Mathematics and Galton’s goal was to to predict (or explain) how Statistics Department, York University, 4700 Keele St, Toronto, Ontario, M3J 1P3, Canada e-mail: a characteristic, Y , (e.g., height) of children was re- [email protected]. John Fox is Senator William lated to that of their parents, X. To this end, he calculated summaries, Ave(Y X), and, for symme- McMaster Professor of Social Statistics, Department of | Sociology, McMaster University, 1280 Main Street try, Ave(X Y ), and plotted these as lines of means | West, Hamilton, Ontario, L8S 4M4, Canada e-mail: on his diagram. Lo and behold, he had a second vi- [email protected]. sual insight: the lines of means of (Y X) and (X Y ) | | This is an electronic reprint of the original article corresponded approximately to the locus of horizon- published by the Institute of Mathematical Statistics in tal and vertical tangents to the concentric ellipses. Statistical Science, 2013, Vol. 28, No. 1, 1–39. This To complete the picture, he added lines showing the reprint differs from the original in pagination and major and minor axes of the family of ellipses, with typographic detail. the result shown in Figure 1. 1 2 M. FRIENDLY, G. MONETTE AND J. FOX lines to the curve) with remarkable clarity nearly 2000 years before the development of analytic ge- ometry by Descartes. Over time, the ellipse would be called to duty to provide simple explanations of phenomena once thought complex. Most notable is Kepler’s insight that the Copernican theory of the orbits of plan- ets as concentric circles (which required notions of epicycles to account for observations) could be brought into alignment with the detailed observational data from Tycho Brahe and others by an exquisitely sim- ple law: “The orbit of every planet is an ellipse with the sun at a focus.” One century later, Isaac New- ton was able to connect this elliptical geometry with astrophysics by deriving all three of Kepler’s laws as simpler consequences of general laws of motion and Fig. 1. Galton’s 1886 diagram, showing the relationship of universal gravitation. height of children to the average of their parents’ height. The This paper takes up the cause of the ellipse as a diagram is essentially an overlay of a geometrical interpreta- tion on a bivariate grouped frequency distribution, shown as geometric form that can provide similar service to numbers. statistical understanding and data analysis. Indeed, it has been doing that since the time of Galton, but It is not stretching the point too far to say that these graphic and geometric contributions have of- a large part of modern statistical methods descends ten been incidental and scattered in the literature from these visual insights:1 correlation and regres- [e.g., Bryant (1984), Campbell and Atchley (1981), sion [Pearson (1896)], the bivariate normal distri- Saville and Wood (1991), Wickens (1995)]. We fo- bution, and principal components [Pearson (1901), cus here on visual insights through ellipses in the Hotelling (1933)] all trace their ancestry to Galton’s areas of linear models, multivariate linear models geometrical diagram.2 and mixed-effect models. Our goal is to provide as Basic geometry goes back at least to Euclid, but comprehensive a treatment of this topic as possible the properties of the ellipse and other conic sections in a single article together with online supplements. may be traced to Apollonius of Perga (ca. 262 BC– The plan of this paper is as follows: Section 2 ca. 190 BC), a Greek geometer and astronomer who provides the minimal notation and properties of el- gave the ellipse, parabola and hyperbola their mod- lipsoids3 necessary for the remainder of the paper. ern names. In a work popularly called the Conics Due to length restrictions, other useful and impor- [Boyer (1991)], he described the fundamental prop- tant properties of geometric and statistical ellipsoids erties of ellipses (eccentricity, axes, principles of tan- have been relegated to the Appendix. Section 3 de- gency, normals as minimum and maximum straight scribes the use of the data ellipsoid as a visual sum- mary for multivariate data. In Section 4 we apply 1Pearson [(1920), page 37] later stated, “that Galton should data ellipsoids and confidence ellipsoids for parame- have evolved all this from his observations is to my mind one ters in linear models to explain a wide range of phe- of the most noteworthy scientific discoveries arising from pure nomena, paradoxes and fallacies that are clarified analysis of observations.” by this geometric approach. This view is extended 2Well, not entirely. Auguste Bravais [1811–1863] (1846), an astronomer and physicist first introduced the mathematical to multivariate linear models in Section 5, primar- theory of the bivariate normal distribution as a model for the ily through the use of ellipsoids to portray hypoth- joint frequency of errors in the geometric position of a point. esis (H) and error (E) covariation in what we call Bravais derived the formula for level slices as concentric el- HE plots. Finally, in Section 6 we discuss a diverse lipses and had a rudimentary notion of correlation but did not appreciate this as a representation of data. Nonetheless, Pearson (1920) acknowledged Bravais’s contribution, and the 3As in this paragraph, we generally use the term “ellipsoid” correlation coefficient is often called the Bravais-Pearson co- as to refer to “ellipse or ellipsoid” where dimensionality does efficient in France [Denis (2001)]. not matter or context is clear. ELLIPTICAL INSIGHTS 3 Table 1 Statistical and geometrical measures of “size” of an ellipsoid Size Conceptual formula Geometry Function (a) Generalized variance: det(Σ)= Qi λi area, (hyper)volume geometric mean (b) Average variance: tr(Σ)= Pi λi linear sum arithmetic mean −1 (c) Average precision: 1/ tr(Σ ) = 1/ Pi(1/λi) harmonic mean (d) Maximal variance: λ1 maximum dimension supremum collection of current statistical problems whose so- 2.2 Statistical Ellipsoids lutions can all be described and visualized in terms In statistical applications, C will often be the in- of “kissing ellipsoids.” verse of a covariance matrix (or a sum of squares 2. NOTATION AND BASIC RESULTS and cross-products matrix) and the ellipsoid will be centered at the means of variables or at estimates of There are various representations of an ellipse (or parameters under some model. Hence, we will also ellipsoid in three or more dimensions), both geomet- use the following notation: ric and statistical. Some basic notation and proper- For a positive definite matrix Σ we use (µ, Σ) ties are described below. to denote the ellipsoid E 2.1 Geometrical Ellipsoids T 1 (4) := x : (x µ) Σ− (x µ) = 1 . We refer to the common notion of a bounded ellip- E { − − } When Σ is the covariance matrix of a multivariate soid (with nonempty interior) in the p-dimensional vector x with eigenvalues λ λ , the follow- space Rp as a proper ellipsoid. An origin-centered 1 2 ing properties represent the “size”≥ of≥··· the ellipsoid in proper ellipsoid may be defined by the quadratic Rp form (see Table 1). T For testing hypotheses for parameters of multi- (1) := x : x Cx 1 , E { ≤ } variate linear models, these different senses of “size” where equality in equation (1) gives the boundary, correspond (with suitable transformations) to (a) T x = (x1,x2,...,xp) is a vector referring to the co- Wilks’s Λ, (b) the Hotelling–Lawley trace, (c) the ordinate axes and C is a symmetric positive defi- Pillai trace, and (d) Roy’s maximum root tests, as nite p p matrix.