Correspondence Analysis Using Orthogonal Polynomials Eric John Beh University of Wollongong

University of Wollongong Research Online University of Wollongong Thesis Collection University of Wollongong Thesis Collections 1998 Correspondence analysis using orthogonal polynomials Eric John Beh University of Wollongong Recommended Citation Beh, Eric John, Correspondence analysis using orthogonal polynomials, Doctor of Philosophy thesis, School of Mathematics and Applied Statistics, University of Wollongong, 1998. http://ro.uow.edu.au/theses/2044 Research Online is the open access institutional repository for the University of Wollongong. For further information contact the UOW Library: [email protected] Correspondence Analysis using Orthogonal Polynomials Eric John Beh, B. Math (Hons), School of Mathematics and Applied Statistics University of Wollongong, Wollongong, NSW, Australia A thesis submitted for the degree of DOCTOR OF PHILOSOPHY in the University of Wollongong, 1998. Declaration In accordance with the regulations of the University of Wollongong, I hereby state that the work described herein is my own original work, except where due references are made, and has not been submitted for a degree at any other university or institution. _ Eric John Beh -1 - Abstract Simple correspondence analysis is a multivariate statistical technique and is a method of visualising the categories of a two-way contingency table, while multiple correspondence analysis is a method of visualising the categories of a multi-way contingency table. However, the technique completely ignores any ordinality that may exist within a set of categories. Likewise, the classical Pearson chi-squared test ignores this ordinal structure. This thesis presents a decomposition of the Pearson chi-squared statistic for multi-way contingency fables by generalising the decomposition of the two-way Pearson chi-squared statistic developed in the past. The advantage of these new statistics is that they enable a detailed investigation of the nature of the association between two or more categorical variables, of which one or more has an ordinal structure. A new method of graphically displaying the categories of a two-way contingency table is developed and is conducted by using the decomposition of its Pearson chi-squared statistic. This method takes into consideration the ordinal nature of any underlying variable and enables a more informative and easier interpretation than the classical approach. This new method of correspondence analysis is then extended so that the categorical variables of a three-way and more generally any multi-way contingency table, with one or more of these variables being ordinal, can be analysed. The new Pearson chi-squared decompositions are employed for this analysis. The new technique of correspondence analysis is shown to be applicable in a broader context, by analysing data sets that are not in the form of a contingency table. For example, this thesis focuses on the application of the new technique to rank type data, which enables the - iii - researcher to visualise the association between the products tested and the rankings they received. A new approach to parameter estimation using orthogonal polynomials for log-linear analysis is also presented and is a much simpler method of model-fitting than the widely used technique. The advantage of this new approach is that parameter estimates higher than the first order can be easily calculated, thereby providing the researcher with a better fitting method than by using the techniques used in the past. - IV - Contents Table of Contents Declaration . 1 Acknowledgements ii Abstract iii Table Of Contents . v Chapter 1 Introduction 1.1 The Contingency Table . 1 1.2 Development of Correspondence Analysis 5 1.3 Overview ..... 14 Section 1: Classical Correspondence Analysis Chapter 2 Simple Correspondence Analysis 2.1 Introduction 18 2.2 Notation and Definitions 19 2.2.1 The Data and Marginals 19 2.2.2 Total Inertia 19 2.2.3 Profiles 19 2.3 Singular Value Decomposition 20 2.4 Profile Co-ordinates 23 2.4.1 Standard Profile Co-ordinates 23 2.4.2 Classical Profile Co-ordinates 25 2.4.3 Goodman's Profile Co-ordinates 27 2.5 Transition Formulae 30 2.6 Distances 32 2.6.1 Centring of Profile Co-ordinates 32 2.6.2 Distance from the Origin 33 2.6.3 Within Variable Distances 34 2.6.4 Between Variable Distances 36 2.7 Modelling in Correspondence Analysis 37 2.7.1 RC Correlation Model 37 2.7.2 Basic Correspondence Model 38 2.7.3 Reconstitution Model . 41 - v - Contents 2.7.4 Other Models 42 2.8 Adequacy of the Correspondence Plot 44 2.9 Further Developments of Correspondence Analysis 46 2.10 Examples ....... 50 2.10.1 Socio-Economic and Mental Health Data 50 2.10.2 Crime Data 57 2.10.3 Hospital Data 60 Chapter 3 Multiple Correspondence Analysis 3.1 Introduction ....... 63 3.2 MCA via the Indicator Matrix ..... 65 3.2.1 Correspondence Analysis of the Z Matrix (Two-way) 65 3.2.2 Correspondence Analysis of the Z Matrix (m-way) 69 3.3 MCA via the Burt Matrix ..... 69 3.4 The Gifi System ....... 71 3.5 Stacking and Concatenation ..... 72 3.6 Joint Correspondence Analysis .... 75 3.7 The Modelling Approaches to Correspondence Analysis . 77 3.7.1 Introduction ...... 77 3.7.2 The Tucker3 Model 78 3.7.3 The CANDECOMP/PARAFAC Models . 81 3.8 Examples ........ 83 3.8.1 A Two-way Contingency Table Example . 83 3.8.2 A Three-way Contingency Table Example . 87 Section 2 : Ordinal Correspondence Analysis Chapter 4 Partitioning Pearson's Chi-squared Statistic 4.1 Introduction ....... 92 4.2 Orthogonal Polynomials ..... 95 4.2.1 The General Recurrence Relation ... 95 4.2.2 Simplification of the General Recurrence Relation 97 4.2.3 Generalisation of the General Recurrence Relation 98 TWO-WAY CONTINGENCY TABLES 4.3 A Chi-squared Partition for Singly Ordered Two-way Tables - VERSION 1 101 VI Contents 4.4 A Chi-squared Partition for Singly Ordered Two-way Tables - VERSION 2 103 4.5 A Chi-squared Partitions for Doubly Ordered Two-way Tables 106 THREE-WAY CONTINGENCY TABLES 4.6 A Chi-squared Partition for Singly Ordered Three-way Tables 110 4.7 A Chi-squared Partition for Doubly Ordered Three-way Tables 112 4.8 A Chi-squared Partition for Completely Ordered Three-way Tables 117 4.9 Other Partitions for Three-way Tables ... 122 4.9.1 Singly Ordered Three-way Tables ... 123 4.9.2 Doubly Ordered Three-way Tables ... 123 4.10 A Chi-squared Partition for Multi-way Tables . 124 4.11 Examples ........ 125 4.11.1 A Completely Ordered Three-way Example . 125 4.11.2 A Doubly Ordered Three-way Example . 129 Chapter 5 Simple Ordinal Correspondence Analysis 5.1 Introduction ....... 134 5.2 Doubly Ordered Two-way Contingency Tables . 135 5.2.1 Introduction ...... 135 5.2.2 The Method 135 5.2.3 Standard Co-ordinates ..... 136 5.2.4 Profile Co-ordinates 137 5.2.5 Modelling Ordinal Correspondence Analysis . 139 5.2.6 Transition Formulae ..... 143 5.2.7 Centring of the Profile Co-ordinates . 150 5.2.8 Distance from the Origin .... 150 5.2.9 Within Variable Distances .... 152 5.2.10 Additional Information ..... 153 i) Non-Zero Off-Diagonal Associations . 154 ii) (Approximately) Zero Off-Diagonal Associations 157 5.3 Singly Ordered Two-way Contingency Tables - VERSION 1 159 5.3.1 Introduction ...... 159 5.3.2 Profile Co-ordinates ..... 160 5.3.3 Modelling Singly Ordered Tables ... 161 5.3.4 Distances ....... 162 (i) Centring 162 - Vll Contents (ii) Distance from the Origin . 162 (iii) Within Variable Distances . 163 5.3.5 Additional Information ..... 163 5.4 Singly Ordered Two-way Contingency Tables - VERSION 2 164 5.4.1 The Method 164 5.4.2 Profile Co-ordinates ..... 165 5.4.3 Modelling VERSION 2 165 5.4.4 Transition Formula ..... 166 5.4.5 Distances ....... 167 5.4.6 Additional Information ..... 167 5.5 Examples ........ 169 5.5.1 Socio-Economic and Mental Health Data . 169 5.5.2 Drug Data 174 5.5.3 Hospital Data .178 5.5.4 Dream Data ...... 180 Chapter 6 Comparative Study of Different Scoring Schemes 6.1 Introduction 184 6.2 Equal Scores 185 6.3 Approximately Equal Scores 186 6.4 Scoring Methods . 187 6.4.1 Natural Scores 188 6.4.2 Midrank Scores 190 6.4.3 Nishisato Scores . 191 6.4.4 Singular Vectors . 190 6.5 Non-Ordered Categorical Data 193 6.6 Distances 194 6.7 Correlation . 194 6.8 Rotation and Reflection . 196 6.9 Least Squares Rotation . 200 6.10 Examples 203 6.10.1 Ordering of a Pair of Scoring Schemes 203 6.10.2 Socio-Economic and Mental Health Status Data 205 6.10.3 Least Squares Rotation of Socio-Economic and Mental Health Status Data 211 - viii - Contents Chapter 7 Multiple Ordinal Correspondence Analysis 7.1 Introduction ...... 7.2 Completely Ordered Three-way Contingency Tables 7.2.1 Decomposing Pearson's Ratios 7.2.2 Standard Co-ordinates 7.2.3 Trivariate Profile Co-ordinates 7.2.4 Bivariate Profile Co-ordinates 7.2.5 Bi-profile Co-ordinates 7.2.6 Transition Formulae 7.2.7 Modelling Completely Ordered Three-way Contingency Tables ...... 7.2.8 Centring of the Profile Co-ordinates . 7.3 Singly Ordered Three-way Contingency Tables 7.3.1 Decomposing Pearson's Ratios . 7.3.2 Standard Profile Co-ordinates . 7.3.3 Bi-Conditional Profile Co-ordinates . 7.3.4 Uni-Conditional Row Profile Co-ordinates . 7.3.5 Unconditional Row Profile Co-ordinates 7.3.6 Modelling Singly Ordered Three-way Contingency Tables ...... 7.4 Doubly Ordered Three-way Contingency Tables 7.4.1 Decomposing Pearson's Ratios . 7.4.2 Profile Co-ordinates ...

Load more