Confidence Circles for Correspondence Analysis Using Orthogonal Polynomials

JOURNAL OF APPLIED MATHEMATICS AND DECISION SCIENCES, 5(1), 35–45 Copyrightc 2001, Lawrence Erlbaum Associates, Inc. CONFIDENCE CIRCLES FOR CORRESPONDENCE ANALYSIS USING ORTHOGONAL POLYNOMIALS ERIC J. BEH† School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, NSW, 2522, Australia Abstract. An alternative approach to classical correspondence analysis was developed in [3] and involves decomposing the matrix of Pearson contingencies of a contingency table using orthogonal polynomials rather than via singular value decomposition. It is especially useful in analysing contingency tables which are of an ordinal nature. This short paper demonstrates that the confidence circles of Lebart, Morineau and Warwick (1984) for the classical approach can be applied to ordinal correspondence analysis. The advantage of the circles in analysing a contingency table is that the researcher can graphically identify the row and column categories that contribute or not to the hypothesis of independence. 1. Introduction The correspondence analysis technique of [3] was shown to be mathematically similar to the classical correspondence analysis approach discussed by several authors, including Lebart, Morineau and Warwick (1984), [8] and [9]. However there is a major difference between the approaches, and this is concerned with the method of decomposing the Pearson chi-squared statistic. The classical approach decomposes the statistic into singular values by partitioning the matrix of Pearson contingencies using singular value decomposition. The approach of [3] decomposes the Pearson chi- squared statistic into bivariate moments, such as linear–by–linear, linear– by–quadratic, etc, by partitioning the matrix of Pearson contingencies using the orthogonal polynomials defined in [4]. Therefore the interpretation of the correspondence plots is very different. The ordinal correspondence plots of [3] graphically show how categories within a variable are similar or not by their proximity from each other along the first (linear), second (dispersion) and higher axes. The interpretation of the correspondence plots from the classical correspondence analysis technique is unclear. Points significantly † Requests for reprints should be sent to E. J. Beh, School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, NSW, 2522, Australia. 36 E. J. BEH far from the origin indicate that they contribute to the dependency between the row and column variables, while points close to the origin indicate they do not make such a contribution. While the interpretation of the ordinal plots allows us to reach the same conclusions, the classical correspondence plot will not explain how two points far from each other are different; the classical approach will only make the conclusion that they are different. With ordinal correspondence plots, we can determine which row and column categories, if any, contribute to the dependency between the two variables using confidence circles. Lebart et al. (1984) defined such circles for classical correspondence analysis. This paper shows that similar confidence circles can be calculated for each row and column profile co-ordinate by using ordinal correspondence analysis. The derivations presented here are for the row categories, while those for the column categories can be made in a similar manner. Section 2 defines the notation to be used in this presentation as well as defining the radii length of the confidence circle for the ith row profile co-ordinate in a plot using classical correspondence analysis. Section 3 shows that for the correspondence analysis approach of [3] the radius of the confidence circles can be derived in exactly the same way as those from classical correspondence analysis. Section 4 shows the relationship between the marginal frequencies of a set of categories and the radii length of the confidence circles. Section 5 consists of two examples which show the application of the confidence circle using doubly ordered correspondence analysis. 2. Confidence Circles for Classical Correspondence Analysis Consider an I × J two-way contingency table, N, where the (i, j)th cell entry is denoted as nij for i =1, 2,...,I and j =1, 2,...,J. Let the grand total of N be n and the probability matrix be P so that the (i, j)th cell I J entry is pij = nij/n for which pij = 1. Define the i th row marginal i=1 j=1 J proportion as pi• = pij and the j th column marginal probability as j=1 I I J p•j = pij so that pi• = p•j =1. i=1 i=1 j=1 The confidence circle of Lebart et al. (1984) is a method of observing the importance of a profile’s position in a correspondence plot. Generally, if the origin lies outside the confidence circle for a particular category, then that category contributes to the dependency between the row and CONFIDENCE CIRCLES FOR CORRESPONDENCE ANALYSIS 37 column categories of the contingency table. If the origin lies within the circle for a particular category, then that category does not contribute to the dependency between the variables. Lebart et al. (1984) showed that for classical correspondence analysis the radii length of the confidence circle fot the ith row profile co-ordinate can be calculated by 2 χ(J−1) ri = (1) npi• 2 where χ(J−1) is the theoretical chi-squared value with J − 1 degrees of freedom at the α level of significance. Generally a correspondence plot consists of only two dimensions, but can include three or more. However, visually representing multiple dimensions is conceptually difficult; [2], [7] and [11]([11], [12]) presented some novel approaches to visualising multiple dimensions. If a correspondence plot consists of two dimensions, then with 2 2 degrees of freedom and at the 5% level of significance, χ(2) =5.99. There- fore, the radius of the confidence circle for the ith row profile co-ordinate can be approximated by 5.99 ri = (2) npi• 3. Confidence Circles for Ordinal Correspondence Analysis The radii length of the confidence circle for the ith row profile using the correspondence analysis of [3] is mathematically identical to the radii length using classical correspondence analysis. [6] calculated confidence circles for their analysis using the same orthogonal polynomial definitions as we do here but the plotting system they considered is different. Suppose that a doubly ordered correspondence analysis is applied to a two-way contingency table. Then denote the row profile co-ordinate of the ∗ i th row category along the k th axis as fik for k =1, 2,...,J − 1 which is defined by J p f ∗ ij b j ik = p k ( ) j=1 i• This row profile co-ordinate is the weighted sum of the column orthogonal polynomials or order k, {bk(j)}, where the weights used are from the profile ∗ of the i th row category, {pij/pi•}; see [3] for a derivation of fik. 38 E. J. BEH By using equations (3.1.10) and (3.1.11) of [3], the relationship between the chi-squared statistic and the row profile co-ordinates is J−1 I 2 ∗ 2 X = n pi• (fik) (3) k=1 i=1 2 2 For the i th row profile co-ordinate, the contribution to X is Xi where J−1 2 ∗ 2 Xi = n pi• (fik) (4) k=1 2 for all i =1, 2,...,I and where Xi has a Pearson chi-squared distribution 2 with J − 1 degrees of freedom; χ(J−1). From (4) J−1 2 ∗ 2 Xi (fik) = (5) npi• k=1 By comparison with (2) the radii length for the confidence circle of the ith row profile co-ordinate can be taken to be the square root of the right hand 2 side of (5) with Xi replaced by the 100 (1 − α) % point of its approximate 2 distribution; χJ−1. When the ordinal correspondence plot consists of two dimensions, the square root of (5) with this replacement is identical to (2). Confidence circles can also be calculated with the centre at the origin. Those points not contained within the circle all contribute to the dependency of the row and column variables that form the table. Those points lying within the circle, do not make such a contribution. [6] considered confidence circles with the centre at the origin, as well as circles with the origin at the position of the profile co-ordinate. However, Lebart et al. (1984, p183) state that In practice, instead of drawing concentric circles around the origin, it is clearer and easier to draw them around each point concerned, and look at the position of the origin. The disadvantage of drawing a circle with the centre at the origin is that it assumes that points close to the origin will never significantly contribute to the dependency of the row and column variables, while those far from the origin will always make such a contribution. While this may occur in many situations, it will not always occur. CONFIDENCE CIRCLES FOR CORRESPONDENCE ANALYSIS 39 4. Relationship Between a Marginal Frequency and its Radii Length Observing (2), the radii length will depend on the proportion of observations classified into a category of the contingency table. A large proportion of observations classified will have a relatively small radii length, while a small proportion of classified observations will have a relatively large radii length. These observations can be seen in the application of confidence circles in Lebart et al. (1984, p51, Table 5). The radii length defined by (2) shows that a variable with equi-probable responses will have equal length radii for each of the response. Therefore, when conducting an ordinal correspondence analysis on ranked data, as has been done by [5], the radius of the confidence circle for each of the row and column profile co-ordinates will be identical.

Confidence Circles for Correspondence Analysis Using Orthogonal Polynomials

Correspondence Analysis

A Correspondence Analysis of Child-Care Students' and Medical

449, 468 Across-Stage Inferencing 568, 57

Research Article the Use of Multiple Correspondence Analysis to Explore Associations Between Categories of Qualitative Variables in Healthy Ageing

Correspondence Analysis and Classification Gilbert Saporta, Ndeye Niang Keita

Implementing and Interpreting Canonical Correspondence Analysis in SAS Laxman Hegde, Frostburg State University, Frostburg, MD

Measuring Explained Variance in Multiple Correspondence Analysis

Correspondence Analysis

The Geometric Interpretation of Correspondence Analysis Author(S): Michael Greenacre and Trevor Hastie Source: Journal of the American Statistical Association, Vol

Chapter 6 Correspondence Analysis

Biplots in Practice

Chapter 11 Correspondence Analysis