Multivariate Data Analysis Practical and Theoretical Aspects of Analysing Multivariate Data with R
Total Page:16
File Type:pdf, Size:1020Kb
Multivariate Data Analysis Practical and theoretical aspects of analysing multivariate data with R Nick Fieller Sheffield © NRJF 1982 1 (Left blank for notes) © NRJF 1982 2 Multivariate Data Analysis: Contents Contents 0. Introduction ......................................................................... 1 0.0 Books..............................................................................................1 0.1 Objectives .......................................................................................3 0.2 Organization of course material......................................................3 0.3 A Note on S-PLUS and R.................................................................5 0.4 Data sets.........................................................................................6 0.4.1 R data sets ................................................................................................ 6 0.4.2 Data sets in other formats....................................................................... 6 0.5 Brian Everitt’s Data Sets and Functions .........................................7 0.6 R libraries required..........................................................................8 0.7 Subject Matter.................................................................................9 0.8 Subject Matter / Some Multivariate Problems...............................11 0.9 Basic Notation...............................................................................13 0.9.1 Notes ....................................................................................................... 14 0.9.2 Proof of results quoted in §0.9.1........................................................... 15 0.10 R Implementation........................................................................17 0.10.1 Illustration in R that var(XA)=Avar(X)A ........................................... 20 Tasks 1 ...............................................................................................23 1 Graphical Displays............................................................. 27 1.1 1 Dimension.................................................................................27 1.2 2 Dimensions ...............................................................................38 1.2.1 Examples: ............................................................................................... 38 1.3 3 Dimensions ...............................................................................42 1.4 3 Dimensions ............................................................................45 1.4.1 Sensible Methods................................................................................... 45 1.4.2 Andrews’ Plots ....................................................................................... 51 1.4.3 Whimsical Methods................................................................................ 58 1.4.4 Chernoff Faces ....................................................................................... 59 1.5 Further Reading............................................................................63 1.6 Conclusions ..................................................................................64 © NRJF 1982 i Multivariate Data Analysis: Contents 2 Reduction of Dimensionality............................................. 67 2.0 Preliminaries .................................................................................67 2.0.1 Warning:.................................................................................................. 67 2.0.2 A Procedure for Maximization............................................................... 67 2.1 Linear Techniques ........................................................................68 2.1.0 Introduction ............................................................................................ 68 2.1.1. Principal Components .......................................................................... 69 2.1.2 Computation ........................................................................................... 74 Tasks 2 ...............................................................................................76 2.1.3 Applications............................................................................................ 79 2.1.4 Example (from Morrison)....................................................................... 82 2.1.4.1 R calculations................................................................................................ 83 2.1.5 General Comments ................................................................................ 84 2.1.6 Further Example of Interpretation of PCA Coefficients ...................... 86 2.1.7 Computation in R ................................................................................... 91 2.1.7.1 R function screeplot() ............................................................................. 93 2.1.8 Example: Iris data. (Analysis in R)........................................................ 94 2.1.9 Biplots..................................................................................................... 96 2.1.10 Cars Example Again, Using Correlation Matrix ................................ 97 2.1.11 Notes ................................................................................................... 101 2.1.12 Miscellaneous comments.................................................................. 103 2.1.13 PCA and Outliers................................................................................ 104 2.1.14 Summary of Principal Component Analysis.................................... 106 Tasks 3 .............................................................................................107 2.2 Factor Analysis .........................................................................110 2.3 Non-Linear Techniques.............................................................111 2.3.1 Generalized Principal Components.................................................... 111 2.4 Summary.....................................................................................113 Tasks 4 .............................................................................................114 Exercises 1 .......................................................................................117 © NRJF 1982 ii Multivariate Data Analysis: Contents 3 Multidimensional Scaling Techniques ........................... 122 3.0 Introduction .................................................................................122 3.1 Illustrations..................................................................................126 3.1.1 Reconstructing France ........................................................................ 126 3.1.2 British Towns By Road ........................................................................ 128 3.1.3 Time Sequence Of Graves................................................................... 129 3.2 Non-metric methods....................................................................132 3.3 Metric Methods: The Classical Solution or...............................136 Principal Coordinate Analysis ...........................................................136 3.4 Comments on practicalities.........................................................139 3.5 Computer implementation...........................................................140 3.6 Examples ....................................................................................141 3.6.1 Road distances between European cities.......................................... 141 3.6.2 Confusion between Morse code signals............................................ 143 3.6.3 Confusion between symbols for digits: R analysis........................... 144 Tasks 5 .............................................................................................147 3.6.5 Notes (1): similarity or dissimilarity measures?................................ 150 3.6.6 Notes (2): caution on interpretation of close points ......................... 150 3.7 Minimum Spanning Trees ...........................................................151 3.7.1 Computation in R ................................................................................. 151 3.7.2 Example: Morse confusions................................................................ 152 3.8 Duality with PCA .........................................................................154 3.9 Further Examples........................................................................155 3.8.1 Iris Data................................................................................................. 156 3.8.2 Swiss Demographic Data .................................................................... 158 3.8.3 Forensic Glass Data............................................................................. 161 3.9 Summary and Conclusions .........................................................164 Tasks 6 .............................................................................................166 © NRJF 1982 iii Multivariate Data Analysis: Contents 4 Discriminant Analysis...................................................... 167 4.0 Summary.....................................................................................167 4.1 Introduction .................................................................................168 4.2 Outline of theory of LDA..............................................................169 4.3 Preliminaries