JOURNAL OF APPLIED MATHEMATICS AND DECISION SCIENCES, 5(1), 35–45 Copyright c 2001, Lawrence Erlbaum Associates, Inc.
CONFIDENCE CIRCLES FOR CORRESPONDENCE ANALYSIS USING ORTHOGONAL POLYNOMIALS
ERIC J. BEH† School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, NSW, 2522, Australia
Abstract. An alternative approach to classical correspondence analysis was developed in [3] and involves decomposing the matrix of Pearson contingencies of a contingency table using orthogonal polynomials rather than via singular value decomposition. It is especially useful in analysing contingency tables which are of an ordinal nature. This short paper demonstrates that the confidence circles of Lebart, Morineau and Warwick (1984) for the classical approach can be applied to ordinal correspondence analysis. The advantage of the circles in analysing a contingency table is that the researcher can graphically identify the row and column categories that contribute or not to the hypothesis of independence.
1. Introduction
The correspondence analysis technique of [3] was shown to be mathemat- ically similar to the classical correspondence analysis approach discussed by several authors, including Lebart, Morineau and Warwick (1984), [8] and [9]. However there is a major difference between the approaches, and this is concerned with the method of decomposing the Pearson chi-squared statistic. The classical approach decomposes the statistic into singular values by partitioning the matrix of Pearson contingencies using singular value decomposition. The approach of [3] decomposes the Pearson chi- squared statistic into bivariate moments, such as linear–by–linear, linear– by–quadratic, etc, by partitioning the matrix of Pearson contingencies using the orthogonal polynomials defined in [4]. Therefore the interpretation of the correspondence plots is very different. The ordinal correspondence plots of [3] graphically show how categories within a variable are similar or not by their proximity from each other along the first (linear), second (dispersion) and higher axes. The interpretation of the correspondence plots from the classical correspondence analysis technique is unclear. Points significantly † Requests for reprints should be sent to E. J. Beh, School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, NSW, 2522, Australia. 36 E. J. BEH far from the origin indicate that they contribute to the dependency between the row and column variables, while points close to the origin indicate they do not make such a contribution. While the interpretation of the ordinal plots allows us to reach the same conclusions, the classical correspondence plot will not explain how two points far from each other are different; the classical approach will only make the conclusion that they are different. With ordinal correspondence plots, we can determine which row and col- umn categories, if any, contribute to the dependency between the two vari- ables using confidence circles. Lebart et al. (1984) defined such circles for classical correspondence analysis. This paper shows that similar confidence circles can be calculated for each row and column profile co-ordinate by us- ing ordinal correspondence analysis. The derivations presented here are for the row categories, while those for the column categories can be made in a similar manner. Section 2 defines the notation to be used in this presentation as well as defining the radii length of the confidence circle for the i th row profile co-ordinate in a plot using classical correspondence analysis. Section 3 shows that for the correspondence analysis approach of [3] the radius of the confidence circles can be derived in exactly the same way as those from classical correspondence analysis. Section 4 shows the relationship between the marginal frequencies of a set of categories and the radii length of the confidence circles. Section 5 consists of two examples which show the application of the confidence circle using doubly ordered correspondence analysis.
2. Confidence Circles for Classical Correspondence Analysis
Consider an I × J two-way contingency table, N, where the (i, j) th cell entry is denoted as nij for i =1, 2,...,I and j =1, 2,...,J. Let the grand total of N be n and the probability matrix be P so that the (i, j) th cell I J entry is pij = nij/n for which pij = 1. Define the i th row marginal i=1 j=1 J proportion as pi• = pij and the j th column marginal probability as j=1 I I J p•j = pij so that pi• = p•j =1. i=1 i=1 j=1 The confidence circle of Lebart et al. (1984) is a method of observing the importance of a profile’s position in a correspondence plot. Generally, if the origin lies outside the confidence circle for a particular category, then that category contributes to the dependency between the row and CONFIDENCE CIRCLES FOR CORRESPONDENCE ANALYSIS 37 column categories of the contingency table. If the origin lies within the circle for a particular category, then that category does not contribute to the dependency between the variables. Lebart et al. (1984) showed that for classical correspondence analysis the radii length of the confidence circle fot the i th row profile co-ordinate can be calculated by 2 χ(J−1) ri = (1) npi•
2 where χ(J−1) is the theoretical chi-squared value with J − 1 degrees of freedom at the α level of significance. Generally a correspondence plot consists of only two dimensions, but can include three or more. However, visually representing multiple dimensions is conceptually difficult; [2], [7] and [11]([11], [12]) presented some novel approaches to visualising multiple dimensions. If a correspondence plot consists of two dimensions, then with 2 2 degrees of freedom and at the 5% level of significance, χ(2) =5.99. There- fore, the radius of the confidence circle for the i th row profile co-ordinate can be approximated by 5.99 ri = (2) npi•
3. Confidence Circles for Ordinal Correspondence Analysis
The radii length of the confidence circle for the i th row profile using the correspondence analysis of [3] is mathematically identical to the radii length using classical correspondence analysis. [6] calculated confidence circles for their analysis using the same orthogonal polynomial definitions as we do here but the plotting system they considered is different. Suppose that a doubly ordered correspondence analysis is applied to a two-way contingency table. Then denote the row profile co-ordinate of the ∗ i th row category along the k th axis as fik for k =1, 2,...,J − 1 which is defined by J p f ∗ ij b j ik = p k ( ) j=1 i• This row profile co-ordinate is the weighted sum of the column orthogonal polynomials or order k, {bk(j)}, where the weights used are from the profile ∗ of the i th row category, {pij/pi•}; see [3] for a derivation of fik. 38 E. J. BEH
By using equations (3.1.10) and (3.1.11) of [3], the relationship between the chi-squared statistic and the row profile co-ordinates is J−1 I 2 ∗ 2 X = n pi• (fik) (3) k=1 i=1