A TEST OF INDEPENDENCE IN TWO-WAY CONTINGENCY TABLES BASED ON MAXIMAL CORRELATION

Deniz C. Yenig¨un

A Dissertation

Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

August 2007

Committee:

G´abor Sz´ekely, Advisor

Maria L. Rizzo, Co-Advisor

Louisa Ha, Graduate Faculty Representative

James Albert

Craig L. Zirbel ii ABSTRACT

G´abor Sz´ekely, Advisor

Maximal correlation has several desirable properties as a measure of dependence, includ- ing the fact that it vanishes if and only if the variables are independent. Except for a few special cases, it is hard to evaluate maximal correlation explicitly. In this dissertation, we focus on two-dimensional contingency tables and discuss a procedure for estimating maxi- mal correlation, which we use for constructing a test of independence. For large samples, we present the asymptotic null distribution of the test . For small samples or tables with sparseness, we use exact inferential methods, where we employ maximal correlation as the ordering criterion. We compare the maximal correlation test with other tests of independence by Monte Carlo simulations. When the underlying continuous variables are dependent but uncorre- lated, we point out some cases for which the new test is more powerful. iii ACKNOWLEDGEMENTS

I would like to express my sincere appreciation to my advisor, G´abor Sz´ekely, and my co-advisor, Maria Rizzo, for their advice and help throughout this research. I thank to all the members of my committee, Craig Zirbel, Jim Albert, and Louisa Ha, for their time and advice. I also want to thank to the faculty members of the Department of Mathematics and

Statistics, and the Department of Applied and Operations Research, for excellent instruction and help throughout my graduate studies at Bowling Green State University. Finally, I would like to thank to my fianc´ee,G¨une¸sErtan, and my family, for all their support and encouragement. iv

Table of Contents

CHAPTER 1: Introduction 1

1.1 Statement of the Problem ...... 1 1.2 Summary and Objectives ...... 2

CHAPTER 2: Analysis and Dependence Measures 4

2.1 Analysis of Contingency Tables ...... 4 2.1.1 Preliminaries ...... 5

2.1.2 Pearson and Likelihood Ratio Chi-Squared Tests of Independence . . 7 2.1.3 Loglinear Models ...... 8 2.1.4 ...... 10 2.1.5 Exact Tests ...... 11 2.2 Measures of Dependence ...... 12

2.2.1 Correlation Based Measures ...... 13 2.2.2 Measures Based on Distribution and Density Functions ...... 17 2.2.3 Measures of Dependence for Cross Classifications ...... 19

CHAPTER 3: Maximal Correlation Test of Independence 24

3.1 Maximal Correlation ...... 25 3.1.1 Definition ...... 25

3.1.2 Literature Review ...... 26

3.1.3 Attainment ...... 28 v 3.2 Maximal Correlation in Case of Contingency Tables ...... 32 3.2.1 Maximal Correlation ...... 36

3.3 Algebraic Form of Sample Maximal Correlation ...... 38

3.4 Maximal Correlation Test for Contingency Tables ...... 39 3.4.1 Large Sample Case ...... 40

3.4.2 Small Sample Case ...... 41 3.5 A Numerical Illustration ...... 41 3.6 A Related Test of Independence: Correlation Ratio Test ...... 43

3.7 An Example: Lissajous Curve Case ...... 45

CHAPTER 4: Empirical Results 49

2 4.1 The Null Distribution of nSn ...... 49

4.2 Large Sample Behavior of Sn ...... 51 4.3 Power Comparisons ...... 52 4.3.1 Simulation Design ...... 53 4.3.2 Empirical Significance ...... 55

4.3.3 Results of Power Comparisons ...... 55 4.4 Empirical Powers of Correlation Ratio Test and I-Test ...... 62

4.5 Summary of Power Comparisons ...... 64 4.6 An Exploratory Study for the Lissajous Curve Case ...... 65

CHAPTER 5: Conclusions 69

BIBLIOGRAPHY 71

Appendices 75

CHAPTER A: Results of the Empirical Study 76 vi CHAPTER B: Algebraic form of Maximal Correlation 83 B.1 2 × 2 Contingency Tables ...... 83

B.2 3 × 3 Contingency Tables ...... 84

CHAPTER C: Selected R Code 87

C.1 R Function for Maximal Correlation ...... 87

C.2 R Function for Correlation Ratio ...... 88 C.3 R Code for Table A.10 ...... 88 vii

List of Figures

3.1 Plots of Lissajous Curves ...... 46

2 4.1 Empirical Distribution of nSn ...... 51

4.2 Empirical MSE of Sn ...... 52 4.3 Empirical Power for Example 1 ...... 56

4.4 Empirical Power for Example 2 ...... 57 4.5 Empirical Power for Example 3, Case 1 ...... 59 4.6 Empirical Power for Example 3, Case 2 ...... 60 4.7 Empirical Power for Example 4 ...... 61 4.8 Empirical Power for Example 5 ...... 62

A.1 Several Lissajous Curves ...... 82 viii

List of Tables

2.1 The General Form of I × J Contingency Tables ...... 5 2.2 The Joint Distribution of X and Y ...... 7

3.1 Postulates vs. Dependence Measures ...... 26 3.2 Hair and Eye Color of 264 Males ...... 42

2 A.1 Critical values of nSn ...... 76

A.2 Empirical Squared Error of Sn ...... 76 A.3 Empirical Significance ...... 77

A.4 Empirical Power for Example 1 ...... 77 A.5 Empirical Power for Example 2 ...... 78 A.6 Empirical Power for Example 3, Case 1 ...... 78 A.7 Empirical Power for Example 3, Case 2 ...... 79

A.8 Empirical Power for Example 4 ...... 79 A.9 Empirical Power for Example 5 ...... 80

∗ A.10 Empirical Power for In and Sn, Loglinear Case ...... 80

∗ A.11 Empirical Power for In and Sn, Lissajuos Curve Case ...... 81

A.12 I, KX (Y ), and KY (X) for Lissajous Curve Case ...... 81 1

CHAPTER 1

Introduction

A variety of come in the form of cross-classified tables of counts, referred to as contingency tables. After a century of great progress, the analysis of contingency tables is still an active field in statistics, mainly due to its important applications in biological and social sciences. For a comprehensive of the most important methods in the analysis of contingency tables, see Agresti (2002). A principal interest in many studies regarding contingency tables is to test if the variables are independent. Although many good tests are available, no single test is known to be optimal for all independence problems. An overview of several existing tests for independence in contingency tables follows in Chapter 2. In this dissertation, we construct a test of independence for two-dimensional contingency tables, based on maximal correlation. The details of this test are presented and the power performance is compared with other tests of independence.

1.1 Statement of the Problem

Consider two categorical response variables X and Y having I and J levels respectively.

Given a contingency table, we consider the problem of testing if X and Y are independent.

The null hypothesis of statistical independence of X and Y is given by H0 : πij = πi·π·j for 2 i = 1, ...I, j = 1, ...J, where πij denotes the probability that a randomly selected individual falls into category i of variable X and category j of variable Y , and the subscript “·” denotes the sum over the index it replaces.

Our tool for approaching this problem is maximal correlation, which is a convenient measure of dependence. Maximal correlation has several desirable properties, including the fact that it vanishes if and only if the variables are independent. However, except for a few special cases, it is hard to evaluate the maximal correlation explicitly. We discuss a procedure for estimating the maximal correlation for contingency tables, which we use for constructing a test of independence.

1.2 Summary and Objectives

The main objective of this research is to introduce the maximal correlation test of indepen- dence for contingency tables, and evaluate the performance of this test.

We describe the computation of maximal correlation for an observed contingency table, and give a detailed procedure on how to carry out the test of independence. Exact inferential methods are used for contingency tables with small sample size or sparseness. For large samples, the asymptotic null distribution of maximal correlation is used to carry out the test. We also evaluate this test of independence against a wide of alternatives, and compare its power with two well-known tests of independence by an extensive empirical study. This dissertation has five chapters. We introduce the problem of interest in Chapter 1.

Chapter 2 introduces the notation used in the analysis of contingency tables, and summarizes the well-known methods used in this field. This chapter also includes a review of several widely used measures of dependence, including the maximal correlation. In Chapter 3 we give further insight to the concept of maximal correlation, and present the evaluation of maximal correlation for two-way contingency tables. Then we introduce the maximal correlation test 3 of independence by giving the asymptotic null distribution of the and describing the treatment for small samples. The empirical results for the performance of the test are given in Chapter 4. For several alternatives, this chapter includes the comparison of power performance of maximal correlation test of independence with Pearson and likelihood ratio chi-squared tests of independence. Chapter 5 contains our concluding remarks. 4

CHAPTER 2

Analysis of Contingency Tables and Measures of Dependence

In this chapter we present an overview of the topics that form the basis for our study. In

Section 2.1 we discuss the well-known methods used in the analysis of contingency tables. The topics include Pearson and likelihood ratio chi-squared tests of independence, loglinear models, correspondence analysis, and exact tests. In Section 2.2 we present the most com- monly used measures of dependence, without limiting ourselves to the case of contingency tables. The topics of this section include correlation based measures, measures based on distribution and density functions, and measures for cross classifications.

2.1 Analysis of Contingency Tables

This section includes an overview of the well-known methods used in the analysis of contin- gency tables. Earlier applications in this field are on testing independence, by the well-known

Pearson chi-square test of independence and its modifications, as well as the likelihood ratio test. By the 1960’s and 1970’s, the attention of researchers shifted from testing to model- ing and loglinear models gained significant attention. With the loglinear approach, the cell counts in a contingency table are modeled in terms of the associations between the variables. 5 Another descriptive tool for analyzing contingency tables is correspondence analysis, which is a graphical way of representing associations in two-way contingency tables. When asymp- totic results are not appropriate due to small sample size or sparseness in the tables, exact

tests provide an alternative to large sample methods. The remaining of this section outlines all these methods briefly. We begin with the preliminary ideas.

2.1.1 Preliminaries

Definition and Notation

A contingency table is a table of counts, which is used to record and analyze the relationship between two or more variables. The general form of a two-dimensional contingency table is given in Table 2.1, where a sample of n observations is classified with respect to two

qualitative variables X and Y , taking values α1, ..., αI , and β1, ..., βJ , respectively. Such

tables are known as I × J contingency tables. Here, nij denotes the observed count in

the category αi of the variable X and category βj of the variable Y . In what follows, the subscript “·” denotes the sum over the index it replaces.

Y X β1 β2 ··· βJ Total α1 n11 n12 ··· n1J n1· α2 n21 n22 ··· n2J n2· ...... αI nI1 nI2 ··· nIJ nI· Total n·1 n·2 ··· n·J n·· = n Table 2.1: The general form of I × J contingency tables.

Sampling Distributions

Consider the counts nij, i = 1, ..., I, j = 1, ..., J, in the cells of an I × J contingency

table. When we treat counts as random variables, each nij has a distribution concentrated

on nonnegative integers, with expected values mij = E(nij). The Poisson model 6

for counts nij assumes that they are independent Poisson variables with probability mass function (p.m.f.)

nij exp(−mij)mij for nij = 0, 1, 2,.... nij!

P P The total sample size n = i j nij in the Poisson sampling model is random, which is an unusual feature. When we condition on n, the p.m.f. of nij is

à ! n! Y Y Q Q πnij , n ! ij i j ij i j P P where πij = mij/( i j mij). This is the multinomial distribution characterized by the sample size n and the cell probabilities {πij}. If the sample size is deterministic, then this is the joint distribution of cell counts nij, and the sampling scheme is called the multinomial sampling model.

Independence

Let X and Y denote two categorical response variables taking values α1, ..., αI , and β1, ..., βJ , respectively. Consider the multinomial sampling model. When we classify subjects on both variables, the responses (X,Y ) of a randomly selected subject have a , which can be displayed in a rectangular table having I rows for categories of X and J columns for categories of Y . The probabilities {πij} for the multinomial distribution form the joint distribution of X and Y , which is given in Table 2.2. Here πij denotes the probability that

(X,Y ) = (λi, βj). The marginal distributions are the row and column totals obtained by summing the joint probabilities πij. The marginal distribution of X is given by πi·, i = 1, ..., I, and the marginal

distribution of Y is given by π·j, j = 1, ..., J. The variables are statistically independent if all the joint probabilities equal the product of the corresponding marginal probabilities.

A principal interest in many studies regarding contingency tables is to test if the variables 7 Y X β1 β2 ··· βJ Total α1 π11 π12 ··· π1J π1· α2 π21 π22 ··· π2J π2· ...... αI πI1 πI2 ··· πIJ πI· Total π·1 π·2 ··· π·J 1 Table 2.2: The joint distribution of X and Y .

are independent. The null hypothesis of statistical independence is given by

H0 : πij = πi·π·j (2.1)

for i = 1, ...I, j = 1, ...J.

2.1.2 Pearson and Likelihood Ratio Chi-Squared Tests of Inde-

pendence

The Pearson chi-squared test statistic, X2, and the likelihood ratio chi-squared statistics, G2 are given by

X X (n − mˆ )2 X2 = ij ij , (2.2) mˆ i j ij X X 2 G = 2 nij log (nij/mˆ ij), (2.3) i j

where nij, i = 1, ..., I, j = 1, ..., J, are the cell counts in an I × J contingency table, and mˆ ij = (ni·n·j)/n are the estimated expected frequencies under the independence hypothesis. When the independence holds, X2 and G2 have asymptotic chi-squared distributions with degrees of freedom (I−1)(J−1) as n → ∞. These are the most popular tests of independence in contingency tables. However, the adequacy of asymptotic distribution depends both on the sample size n and the number of cells N = IJ. For X2 test, Cochran (1954) suggests that a minimum expected value of 1 is permissible as long as no more than about 20% of the 8 cells have expected values below 5. For G2 test, Koehler (1986) showed that the chi-square approximation is poor when n/N is less than 5. See Agresti (2002, page 391) for more discussion on the adequacy of chi-square approximation for sparse contingency tables.

2.1.3 Loglinear Models

Loglinear models are regression-type models for categorical variables. Consider the multi- nomial sampling model for the categorical variables X and Y as discussed in Section 2.1.1.

Recall that the variables are independent when πij = πi·π·j, i = 1, ..., I, j = 1, ..., J. Let

mij denote the expected frequencies. Then the corresponding expression for the expected

frequencies is mij = nπij = nπi·π·j, which is used for constructing loglinear models. Since the expected frequencies are used, loglinear models also apply for the Poisson sampling model.

Independence Model

The independence equation above can be written as log mij = log n + log πi· + log π·j. This expression is equivalent to

X Y log mij = µ + λi + λj , (2.4)

where X P λi = log πi· − ( k log πk·)/I,

Y P λj = log π·j − ( k log π·k)/J, P P µ = log n + ( k log πk·)/I + ( k log π·k)/J.

P X P Y Note that the parameters satisfy i λi = j λj = 0. The model (2.4) is called the loglinear model of independence for two-dimensional contingency tables.

Saturated Model

Suppose there is dependence between the random variables. The saturated loglinear model

is given by

X Y XY log mij = µ + λi + λj + λij , (2.5) 9 where P P µ = ( i j log mij)/IJ,

X P λi = ( j log mij)/J − µ,

Y P λj = ( i log mij)/I − µ,

XY P P λij = log mij − ( j log mij)/J − ( i log mij)/I + µ.

P X P Y P XY P XY Note that the parameters satisfy i λi = j λj = i λij = j λij = 0, i = 1, ..., I, and j = 1, ..., J. The right hand side of (2.5) resembles the formula for cell in a two-way ANOVA model, allowing . Here µ is the overall mean of the natural log of the

X Y expected frequencies, λi is the main effect for variable X, λj is the main effect for variable

XY Y , λij the interaction effect for variables X and Y . The independence model described above is a special case of the saturated model, where λXY = 0. For multinomial sampling, the cell probabilities of the multinomial distribution corresponding to the saturated loglinear

model (2.5) is given by

exp(µ + λX + λY + λXY ) π = P P i j ij . (2.6) ij X Y XY a b exp(µ + λa + λb + λab )

Loglinear models are especially useful for analyzing higher dimensional contingency tables, however, the number of model parameters increases rapidly. These models can also be viewed as generalized linear models, where the link function is the log function. The odds ratios are the building blocks of loglinear models. To illustrate, for 2 × 2 independence model, in each

Y row, the quantity exp(2λ1 ) is the odds that the column 1 classification is category 1 rather than category 2. Given a contingency table, the loglinear model parameters can be estimated by using maximum likelihood (ML) method. Many loglinear models do not have a closed form for the

ML estimates. In such cases, iterative procedures such as the iterative proportional fitting and the Newton-Raphson method can be used for solving likelihood equations and obtaining the

ML estimates. When loglinear models are used, testing independence corresponds to testing

whether the interaction term is needed in the model or not. One usually follows a hierarchical 10 model fitting approach, which requires starting with the saturated model and deleting higher order interaction terms until the fit of the model to the data becomes unacceptable. For a general treatment of loglinear models, see Bishop, Fienberg and Holland (1975).

2.1.4 Correspondence Analysis

Correspondence analysis is a graphical way of representing associations in two-way contin- gency tables. Consider the following generalization of the independence model:

XM πij = πi·π·j(1 + ρmximyjm), (2.7) m=1

where M = min(I,J) − 1. Here, ρ1, ..., ρm are the canonical correlations and xim and yjm are the corresponding row and column scores which satisfy

PI PJ i=1 ximπi· = 0, j=1 yjmπ·j = 0

PI 2 PJ 2 i=1 ximπi· = 1, j=1 yjmπ·j = 1

PI PJ 0 i=1 ximxim0 πi· = 0, j=1 yjmyjm0 π·j = 0, for m 6= m .

The scores are used in the correspondence analysis graphical display. Unlike loglinear models, correspondence analysis is only useful for two-dimensional contingency tables. This method has been used as a descriptive tool in categorical data analysis, and it is very popular in Eu- rope, especially France, due to a series of papers by Benz´ecriin the 1970’s (see, e.g., Benz´ecri, 1973). Its development is confusing as several researchers worked on it independently, under different names such as method and dual scaling. Nishisato (1980) gives an excellent survey of all these methods under the name of dual scaling. Dual scaling methods are not without criticism. In his review on Nishisato’s book, Aitkin (1982) claims that dual scaling methods suffer severely from the lack of a , and adds

“without a justification based on a statistical model, however, we must regard dual scaling 11 as an ad hoc mathematical technique. Why are we maximizing this criterion?” An independence test for contingency tables based on correspondence analysis is given by Haberman (1981). This test requires assigning scores to the row and column variables,

so that the correlation of scores of row and column variables is maximized subject to some constraints based on the conditions given above. Haberman’s (1981) work is very closely related to the maximal correlation test of independence we discuss in this study. For other independence tests based on correspondence analysis, see Kuriki (2005) and the references therein.

2.1.5 Exact Tests

When working with contingency tables with a small number of observations or sparse data, exact inferential methods provide an alternative to large sample methods. Under the null

hypothesis of independence, the p.m.f. of {nij} includes nuisance parameters {πi·} and {π·j}, thus it has a limited use. These parameters can be eliminated by conditioning on sufficient statistics for them, {ni·} and {n·j}. The p.m.f. of {nij} conditional on the sufficient statistics is given by

Q Q ( ni·!)( n·j!) i Q Q j (2.8) n! i j nij! which is the multivariate hypergeometric distribution. Below is the commonly used algorithm for of independence.

1. Observe a contingency table with frequencies {nij} and calculate the row and column

sums ni· and n·j.

2. Find all possible contingency tables {aij} such that ai· = ni· and a·j = n·j. Compute

the probabilities of obtaining such tables by plugging aij for nij in (2.8).

3. Order all tables with respect to some measure of dependence. 12 4. The p-value of the exact test is the sum of probabilities of obtaining contingency tables which represent equal or greater deviation from independence compared to the observed table.

A well-known example of exact tests is the Fisher exact test for 2 × 2 contingency tables (Fisher, 1934), where exact enumeration by hand is possible. The availability of computa- tional power makes exact tests possible for higher dimensional tables. However, in most cases complete enumeration is still impossible with current computational power. In such cases, one can simulate contingency tables with given marginals and approximate the p-value of the independence test. For two-dimensional tables, the algorithm given by Patefield (1981) is widely used. For higher dimensional tables, one can use the algorithm given by Diaconis and Strumfels (1998). Exact tests are available in most statistical software packages such as SPSS, SAS and R. StatXact is a statistical package that specializes in exact tests. For a survey of exact tests, see Agresti (1992).

2.2 Measures of Dependence

In virtually any field of statistics, there is a need for measuring the dependence between random variables. There are several measures of dependence in the statistical literature, which are often considered as an intermediate step for obtaining tests of independence. In this section we present a review of commonly used measures of dependence. We begin with correlation based measures in Section 2.2.1. In Section 2.2.2 we discuss measures based on distribution functions and density functions. We focus on dependence measures for cross classifications in Section 2.2.3. For a comprehensive survey of most important measures of dependence, see Liebetrau (2005). 13 2.2.1 Correlation Based Measures

Here we discuss several correlation coefficients, which are well-known measures of depen- dence. A correlation coefficient is intended to measure the strength of the relationship between two variables. The strength of the relationship usually refers to the strength of

the tendency to move in the same direction. Different correlation coefficients measure the strength of the relationship in different ways. Throughout Section 2.2.1, we will suppose that X and Y are random variables having

2 2 means µX and µY and finite σX and σY , respectively. We will also suppose that

(X1,Y1), ..., (Xn,Yn) is a random sample of size n from the bivariate population (X,Y ).

Product Correlation Coefficient

Since it was introduced in the late nineteenth century, product moment correlation has been the most popular measure of association. Sometimes it is referred to as Pearson’s product moment correlation, due to a well known paper by Pearson (1896). The product moment correlation coefficient between X and Y is defined by

Cov(X,Y ) E[(X − µ )(Y − µ )] ρ = = X Y . (2.9) σX σY σX σY

The range of ρ is [−1, 1]. The maximum value is achieved in the case of an increasing linear relationship, and the minimum value is achieved in case of a decreasing linear relationship. If X and Y are independent, then ρ = 0, and we say that X and Y are uncorrelated. The converse is not true in general, but if X and Y have a bivariate , inde- pendence is equivalent to being uncorrelated. Given the observations (X1,Y1), ..., (Xn,Yn) from a bivariate distribution, ρ can be estimated by its sample analog

P n ¯ ¯ i=1(Xi − X)(Yi − Y ) ρˆ = £P P ¤ , (2.10) n ¯ 2 n ¯ 2 1/2 i=1(Xi − X) i=1(Yi − Y ) 14 P P ¯ n ¯ n where X = i=1 Xi and Y = j=1 Yj.

For a vector X = (X1, ..., Xp) of random variables, the dependencies as expressed in

terms of correlations are given by a correlation {ρij}, a p × p matrix whose i, j entry

is ρ(Xi,Xj). When X has a multivariate normal distribution, pairwise independence implies total independence and vice versa. For such cases statistical tests of independence based on the correlation matrix can be constructed. For further discussion on such tests, see Anderson (2003, Chapter 9).

Kendall’s τ

Kendall’s τ is a commonly used dependence measure which has a simple interpretation based on the concordance relationships among the variables. Two pairs of observations (Xi,Yi) and

(Xj,Yj) are said to be concordant if either Xi < Xj and Yi < Yj or Xi > Xj and Yi > Yj.

Equivalently, the pairs are concordant if (Xj − Xi)(Yj − Yi) > 0. The pairs are said to be

discordant if (Xj − Xi)(Yj − Yi) < 0. The pairs are said to be tied if (Xj − Xi)(Yj − Yi) = 0.

Let πc, πd and πt denote the probabilities that two randomly selected pairs are concordant, discordant or tied, respectively. Suppose X and Y are continuous variables such that the observations can be fully ranked,

thus πt = 0 by definition. Kendall’s coefficient of concordance is defined by

τ = πc − πd. (2.11)

If two observations (Xi,Yi) and (Xj,Yj) are randomly selected from a bivariate population, τ is the probability that they are concordant minus the probability that they are discordant. The range of τ is [-1,1]. If the order of the population values when ranked according to X is the same as the order when ranked according to Y (perfect agreement in ), the maximum is achieved. If one ordering is the reverse of the other (perfect disagreement in ranking), the minimum is achieved. The independence of X and Y implies that τ = 0, but 15 the converse is not true. A natural estimator of τ is given by

C − D 2(C − D) τˆ =   = , (2.12) a n(n − 1) n   2

where C is the number of concordant pairs, and D is the number of discordant pairs of observations in a sample of size n. This estimator was proposed by Kendall (1938). The

range ofτ ˆa is [-1,1]. The maximum is achieved if all observed pairs are concordant and the minimum value is achieved if all observed pairs are discordant. Under independence,

the quantity [(C − D) − E(C − D)]/[Var(C − D)]1/2 has an asymptotic standard normal

distribution, so an independence test can be constructed. See Hollander and Wolfe (1999, Chapter 8) for details.

Rank Correlation

We noted above that if X and Y are normally distributed, independence is equivalent to being uncorrelated, so the correlation is a convenient measure of dependence. In order to robustify the analysis to non-normal distributions, one can replace observations by ranks. The was first introduced by Spearman (1904). In principle, it is obtained

by replacing Xi and Yi in (2.10) by Ri and Si, where Ri is the rank of Xi among X’s, and Si

is the rank of Yi among Y ’s. In practice, simpler procedures are used to calculate the rank correlation. Each procedure has a different way of treating the ties in the data, and they give the same value when there are no ties. When X and Y are continuous variables and there are no ties in the data, Spearman’s rank correlation coefficient is given by

P 12 n [R − (n + 1)/2][S − (n + 1)/2] ρˆ = i=1 i i . (2.13) s n(n2 − 2)

The range ofρ ˆs is [-1,1] and the extremes are attained only when there is perfect disagreement

or agreement in the . Unlike Kendall’sτ ˆa, it is not easy to assign an operational 16 interpretation to Spearman’s rank correlation coefficient, as it is not an estimator of an easily defined population parameter. See Liebetrau (2005, Chapter 4) for an interpretation ofρ ˆs which involves concordance relationships among three sets of observations. When X and Y are independent, the ranks are also independent and we have E(ˆρs) = 0. See Hollander and Wolfe (1999, Chapter 8) for tests of independence based on rank correlations.

Correlation Ratio

The correlation ratio of Y with respect to X is defined by

· ¸ Var(E(Y |X)) 1/2 K (Y ) = . (2.14) X Var(Y )

The range of KX (Y ) is [0,1]. If X and Y are independent, KX (Y ) = 0, but the converse is not true. However, KX (Y ) = 0 implies that the correlation coefficient between X and

Y vanishes. The quantity KX (Y ) equals 1 if and only if Y = f(X), where f is a Borel- measurable function. The correlation ratio was introduced by Kolmogorov (1933) and its mathematical properties were studied by R´enyi (1959). Since KX (Y ) is not symmetric, R´enyi considers the quantity

K(X,Y ) = max(KX (Y ),KY (X)). (2.15)

In Section 3.6 we take a closer look at K(X,Y ) and we discuss the computation of this quantity from an observed contingency table.

Maximal Correlation

The maximal correlation between X and Y is defined as

S(X,Y ) = sup ρ(f(X), g(Y )), (2.16) f,g 17 where the supremum is taken over all functions f of X and g of Y with finite and nonzero . Here ρ(U, V ) denotes the product moment correlation coefficient between the random variables U and V . The range of maximal correlation is [0,1]. Maximal correlation

has several attractive properties, including the fact that it vanishes if and only if the random variables are independent. The maximum is achieved when one variable is a Borel-measurable function of the other. The maximal correlation was introduced by Gebelein (1941) and received considerable attention in the statistical literature. It is hard to evaluate maximal correlation explicitly, except for some special cases. The main purpose of this study is to compute maximal correla- tion for contingency tables and discuss an independence test based on maximal correlation. These results are given in Chapter 3, along with a more detailed discussion on maximal

correlation.

2.2.2 Measures Based on Distribution and Density Functions

Let X and Y be random variables with distribution functions F and G, and density functions f and g, respectively. Let H denote the joint distribution function of X and Y , and let h denote their joint density function. Let “⊗” denote the product of two functions. In this section we review several measures of dependence based on these functions.

Measures Based on Distribution Functions

The variables X and Y are said to be independent if H(x, y) = F (x)G(y). Therefore, the problem of measuring the dependence between X and Y can be considered as the problem of measuring the distance between H and F ⊗ G. Two well-known distance measures between

two distribution functions F1 and F2 are the Kolmogorov-Smirnov distance

∆1(F1,F2) = sup |F1(x) − F2(x)|, (2.17) x 18 and the Cramer-Von Mises distance

Z 2 ∆2(F1,F2) = (F1(x) − F2(x)) dF1(x). (2.18)

By letting F1 = H and F2 = F ⊗ G these distance functions can be used as measures of dependence. In this case both ∆1 and ∆2 are nonnegative and they vanish if and only if X and Y are independent. Two well known tests of independence based on Cramer-Von Mises distance were given by Hoeffding (1948), and Blum, Kiefer and Rosenblatt (1961).

Measures Based on Density Functions

Using the same principle as in the case of distribution functions, the degree of dependence

between two variables X and Y can be measured by the distance between the joint density function h and f ⊗ g, since the variables are said to be independent if h(x, y) = f(x)g(y).

Two well known distance measures between two density functions f1 and f2 are the Hellinger distance Z np p o2 H = f1(x) − f2(x) dx, (2.19)

and the Kullback-Leibler information distance

Z ½ ¾ f1(x) I = log f1(x)dx. (2.20) f2(x)

By letting f1 = h and f2 = f ⊗ g these distance functions can be used as measures of dependence. In this case both H and I are nonnegative and they vanish if and only if X and Y are independent. See Tjøstheim (1996) for more discussion on independence tests based on density functions. 19 Mean Square Contingency

If µ and ν are two probability measures on a given probability space, then µ is absolutely continuous with respect to ν if µ(A) = 0 for every set A for which ν(A) = 0. The dependence between two variables is said to be regular if their joint distribution is absolutely continuous with respect to the direct product of their distributions. Suppose that the dependence between X and Y is regular. The mean square contingency of X and Y is given by

"ZZ µ ¶ # 1 h(x, y) 2 2 C(X,Y ) = − 1 dF (x)dG(y) . (2.21) f(x)g(y)

The dependence between two discrete variables is always regular. Suppose that X and Y are

discrete variables assuming the values αi (i = 1, 2, ...) and βj (j = 1, 2, ...). Let Ai and Bj

denote the events X = αi and Y = βj respectively. In this case, the mean square contingency between X and Y is defined as

" # 1 X X [P (A B ) − P (A )P (B )]2 2 C(X,Y ) = i j i j . (2.22) P (A )P (B ) i j i j

The notion of mean square contingency for discrete distributions was introduced by and it will be revisited in the next section. The range of C is [0, +∞] but it can be

made [0, 1] with a simple transformation. The mean square contingency vanishes if and only if the variables are independent. See R´enyi (1959) for more on mean square contingency.

2.2.3 Measures of Dependence for Cross Classifications

Consider two categorical response variables X and Y having I and J levels respectively, and consider their cross classification as described in Section 2.1.1. In this section, we review some well known measures of dependence for cross classified variables. We discuss nominal variables in the first two sub-sections, and ordinal variables in the last two sub-sections. 20 Measures Based on the Chi-Squared Statistic

Consider a contingency table {nij}. In Section 2.1.2 we introduced the Pearson chi-squared test statistic given by XI XJ (n − n n /n)2 X2 = ij i· ·j , (2.23) n n /n i=1 j=1 i· ·j which enables us to test whether the observed cell frequencies are consistent with the ex- pected cell frequencies under the hypothesis of independence. The population analog of (2.23) is given by XI XJ (π − π π )2 XI XJ π2 φ2 = ij i· ·j = ij − 1. (2.24) π π π π i=1 j=1 i· ·j i=1 j=1 i· ·j

Note that this is (2.22) applied to contingency tables, and it is known as Pearson’s coeffi- cient of mean squared contingency. The range of φ2 is [0, min(I,J) − 1]. The minimum is achieved when X and Y are independent and the maximum is achieved when there is perfect association. Since the range of φ2 depends on the dimensions of the table, it is not a suitable measure of association. To overcome this, Pearson (1904) proposed the measure

µ ¶ φ2 1/2 p = . (2.25) 1 + φ2

This measure is referred to as Pearson’s contingency coefficient. The measure p is bounded between 0 and 1, but it cannot always attain the upper limit 1 and its range still depends on the dimensions of the table. Tschuprow (1919) proposed the measure of dependence

à !1/2 φ2 t = p . (2.26) (I − 1)(J − 1)

Unless I and J are nearly equal, the range of t may be far from the desired interval [0,1]. Cram´er(1946) proposed the coefficient

µ ¶ φ2 1/2 v = , (2.27) min(I,J) − 1 21 which satisfies v = 1 for perfect association and v = 0 for independence. Estimators of p, t and v are obtained by replacing φ by φˆ, which is obtained by replacing

ˆ2 2 the πij’s in (2.24) with their maximum likelihood estimates nij/n. Note that φ = X /n. Tests based on p, t and v can be constructed, where the percentage points of the distribution of the test statistics can be computed from the distribution of X2. See, for example, Bishop, Fienberg and Holland (1975) for more on tests based on X2. All three measures mentioned above are suitable for nominal data. The main criticism for measures of dependence based on X2 is that they lack a suitable interpretation.

Goodman and Kruskal λ and τ

Measures based on X2 treat the variables symmetrically. If a causal relationship is sought, asymmetric measures must be preferred. Two asymmetric measures of dependence for nom- inal variables are the Goodman and Kruskal λ and the Goodman and Kruskal τ. Unlike measures based on X2, both measures permit direct interpretation. The Goodman and Kruskal λ is designed for tables where the goal is to predict the column variable Y from the row variable X, or vice versa. One can predict the category of Y from the category of X by (i) assuming Y is independent of X, or (ii) assuming Y is a function of X. Then the proportional reduction in error (PRE) measure defined by

Probability of error in (i) − Probability of error in (ii) PRE = Probability of error in (i)

is the relative improvement in predicting Y category obtained when X category is known, as opposed to when X category is unknown. For an I × J table, PRE leads to the Goodman and Kruskal τ given by

¡P ¢ i∈I maxj∈J πij − maxj∈J π·j λY |X = . (2.28) (1 − maxj∈J π·j)

The range of λY |X is [0,1]. The minimum is achieved if and only if the knowledge of the 22 X category is of no help in predicting the Y category. The maximum is achieved if and only if the knowledge of X category completely specifies the Y category. If X and Y are independent λY |X is zero, but the converse is not true. The Goodman and Kruskal τ is the proportion of variation in the response (dependent) variable that can be explained by the predictor (independent) variable. When the rows of the contingency table represent the predictor variable, and the columns represent the response variable, the Goodman and Kruskal τ is given by ³ ´ P P 2 P 2 i∈I j∈J πij/πi· − j∈J π·J τY |X = ³ P ´ . (2.29) 2 1 − j∈J π·j

In this definition, Goodman and Kruskal use Gini’s (1912) measure of variation for categorical

variables. The range of τY |X is [0,1]. The minimum is achieved when there is no association, and the maximum is achieved when there is perfect association. See Goodman and Kruskal

(1954) for more on λY |X and τY |X .

Kendall τb and τc

Kendall’s τ was originally proposed for continuous variables, for which there are no ties and the samples can be fully ranked. For ordinal variables, the formula in (2.12) may be modified to handle ties. Kendall (1945) introduced

C − D τˆb = 1/2 , (2.30) [(n(n − 1)/2 − TX )(n(n − 1)/2 − TY )]

PI PI where TX = i=1 ni·(ni· − 1)/2 and TY = j=1 n·j(n·j − 1)/2. As defined above, C is the number of concordant pairs, D is the number of discordant pairs of observations in a sample

of size n. This index of ordinal association is referred to as Kendall’s τb. The estimatorτ ˆb 23 is in fact the maximum likelihood estimator of the quantity

π − π τ = c d , (2.31) b h P P i1/2 I 2 J 2 (1 − i=1 πi·)(1 − j=1 π·j)

under the multinomial sampling model. When X and Y are independent, τb = 0. The

measure τb is bounded between -1 and 1. For tables that are not square,τ ˆb cannot attain extreme values −1 or 1. Following Kendall, Stuart (1953) introduced

2 min(I,J)(C − D) τˆ = (2.32) c n2(min(I,J) − 1)

as an estimator of τb. This index of ordinal association is useful for any rectangular contin-

gency table, and it is referred to as the Kendall-Stuart τc.

Goodman and Kruskal γ

The most commonly used measure of association for ordinal variables is Goodman-Kruskall γ, which is given by π − π π − π γ = c d = c d . (2.33) πc + πd 1 − πt

Note that γ is the conditional probability that a pair of observations selected randomly from a population are concordant minus the probability that they are discordant, the condi- tion being that the pairs are not tied on either variable. The range of γ is [-1,1]. If X and Y are independent, γ = 0, but the converse is not necessarily true. The maximum likelihood estimator of γ under the multinomial sampling is

C − D γˆ = . (2.34) C + D

See Goodman and Kruskal (1979) for more on γ. 24

CHAPTER 3

Maximal Correlation Test of Independence in Two-Dimensional Contingency Tables

In this chapter we discuss the computation of maximal correlation for two-dimensional con- tingency tables and give a test of independence. We begin with the definition of maximal correlation and a literature review in Section 3.1. This section also contains a discussion about the cases for which the maximal correlation can be computed. In Section 3.2 we com- pute the maximal correlation for two dimensional contingency tables. Section 3.3 contains some results on the algebraic form of maximal correlation for smaller tables. We introduce the maximal correlation test of independence in Section 3.4, and present a numerical illustra- tion in Section 3.5. Section 3.6 introduces another independence test, which is related to the maximal correlation test. The last section includes an interesting case, where we compute the maximal correlation between two continuous variables. 25 3.1 Maximal Correlation

3.1.1 Definition

Consider two random variables X and Y defined on a given probability space. A measure of dependence δ(X,Y ) of these variables should satisfy the following postulates. A) δ(X,Y ) is defined for any pair X,Y neither of which is constant with probability 1. B) δ(X,Y ) = δ(Y,X). C) 0 ≤ δ(X,Y ) ≤ 1. D) δ(X,Y ) = 0 iff X and Y are independent.

E) δ(X,Y ) = 1 if either X = g(Y ) or Y = f(X), where g(·) and f(·) are Borel-measurable functions. F) If the Borel-measurable functions g(·) and f(·) map the real axis in a one-to-one way to itself, then δ(f(X), g(Y )) = δ(X,Y ). G) If the joint distribution of X and Y is normal, then δ(X,Y ) = |ρ(X,Y )|, where ρ(X,Y ) is the correlation coefficient of X and Y . After listing these seven postulates, R´enyi (1959) considers five measures of dependence, namely, the correlation coefficient, two correlation ratios, the mean square contingency, and the maximal correlation. Then he notes that only the maximal correlation satisfies all seven postulates.

For each of these five measures of dependence, Table 3.1 presents which of the seven postulates are satisfied. If a dependence measure satisfies a certain postulate, then the table √ has a “ ” in the corresponding table cell, otherwise the cell is left blank. If a dependence measure does not satisfy a postulate but its modification does, then the modification is written in the corresponding table cell. 26 Postulates Measures A B C D E F G √ √ ρ |ρ| √ √ KX (Y ) √ √ √ √ K(X,Y ) √ √ √ √ √ √ √ S √ √ √ √ C √ C 1+C2

Table 3.1: Postulates vs. dependence measures. ρ: correlation, KX (Y ): correlation ratio of Y on X, K(X,Y ): max(KY (X),KX (Y )), S: maximal correlation, C: mean square contingency.

Definition The maximal correlation S between X and Y is defined as

S(X,Y ) = sup ρ(f(X), g(Y )), (3.1) f,g

where the supremum is taken over all Borel-measurable functions of X and Y with finite and positive variance. Here ρ(U, V ) denotes the product moment correlation coefficient between the random variables U and V .

3.1.2 Literature Review

The maximal correlation was introduced by Gebelein (1941) and received considerable at- tention in the statistical literature. R´enyi (1959) gave the conditions for the existence of

functions f0 and g0 such that S(X,Y ) = ρ(f0(X), g0(Y )). We review R´enyi’s results in more detail in Section 3.1.3. Bell (1962) considered two normalizations of Shannon’s mutual in- formation as a measure of dependence and compared them with maximal correlation. Cs´aki

and Fischer (1963) further studied the mathematical properties of maximal correlation and computed it for a number of examples. Abrahams and Thomas (1980) considered maximal correlation as a measure of dependence in stochastic processes.

Breiman and Friedman (1985) considered the problem, where 27 often the response variable Y and the predictor variable X1, ..., Xp are replaced by functions

θ(Y ) and φ1(X1), ..., φp(Xp), in order to simplify the model. Given the data, the authors discuss a procedure for estimating these functions which minimize the quantity E{[θ(Y ) −

Pp 2 j=1 φj(Xj)] }/V ar[θ(Y )], based on an alternating conditional expectations algorithm. For bivariate case, this corresponds to estimating the functions θ∗ and φ∗ which maximize the correlation between X and Y , thus the given procedure provides a method for estimating the maximal correlation between two variables. A multivariate analog of maximal correlation was considered by Koyak (1987). This work consists of transforming each of the variables, so that the largest partial sums of the eigenvalues of the resulting correlation matrix is maximized. For random variables that take only a finite number of values, maximal correlation is very closely related to the first canonical correlation. For this case Sethuraman (1990) gives a procedure to estimate the maximal correlation from the sample, and gives the asymp- totic distribution of this estimate under the null hypothesis of independence. We discuss Sethuraman’s results in more detail in Section 3.4.1. Gautam and Kimeldorf (1999) considered the calculation of maximal correlation in the case of 2 × k contingency tables. They also give the asymptotic distribution of the sample maximal correlation for 2 × k tables, under the null hypothesis of independence. Dembo, Kagan and Shepp (2001) showed that the maximal correlation between the partial sums of independent and identically distributed random variables with finite second moments equals the Pearson correlation coefficient between sums, and so does not depend on the distribution of the random variables. This result was proved to be true when the variance of the variables are infinite, by Novak (2004). For an arbitrary vector (X,Y ) and an independent Z, Byrc, Dembo and Kagan (2005) studied the maximal correlation between X and Y + λZ. 28 3.1.3 Attainment

Maximal correlation is an attractive measure of dependence; however, since there does not

always exist functions f0(x) and g0(x) such that S(X,Y ) = ρ(f0(X), g0(Y )), it cannot be evaluated explicitly except for special cases. If this equality holds for some f0 and g0, we say that the maximal correlation of X and Y can be attained. R´enyi (1959) gives the conditions under which the maximal correlation can be attained. We present his results in this section. Let X and Y be two random variables on a given probability space. We are looking for functions f0 and g0 such that the maximal correlation between X and Y is attained. Let S = S(X,Y ). Without loss of generality, we can limit our search to functions f and g such that f(X) and g(Y ) have zero expectation and unit variance. Then by the Cauchy-Schwarz inequality we have

ρ(f(X), g(Y )) = E(f(X)g(Y )) = E[E(f(X)g(Y )|X)]

= E[f(X)E(g(Y )|X)] ≤ [E(f 2(X))E(E2(g(Y )|X))]1/2 (3.2)

= [Var(g(Y )|X)]1/2, and the equality holds if and only if f(X) = aE(g(Y )|X) for some constant a. Note that, in that case, Var(f(X)) = a2Var[E(g(Y )|X)] = 1. Then we must have f(X) = p E(g(Y )|X)/ Var[E(g(Y )|X)]. Similarly,

ρ(f(X), g(Y )) = E[E(f(X)|Y )g(Y )] ≤ [Var(f(X)|Y )]1/2, (3.3)

p and the equality holds if and only if g(Y ) = E(f(X)|Y )/ Var[E(f(X)|Y )].

Now suppose that there exist functions f0 and g0 such that the maximal correlation is attained. Then by (3.2) and (3.3) respectively, we have

E(g (Y )|X) f (X) = 0 , (3.4) 0 S 29 E(f (X)|Y ) g (Y ) = 0 . (3.5) 0 S

Thus f0 and g0 can be found as solutions of the system of equations (3.4) and (3.5). Plugging

g0(Y ) = E(f0(X)|Y )/S in (3.4), and f0(X) = E(g0(Y )|X)/S in (3.5), one can rearrange the system of equations as

2 E[E(f0(X)|Y )|X] = S f0(X), (3.6)

2 E[E(g0(Y )|X)|Y ] = S g0(Y ). (3.7)

The function f0 can be determined from (3.6). Once f0 is known, g0 can be obtained from

2 (3.5). Let LX denote the Hilbert space of all random variables of the form f(X) for which

2 E(f(X)) = 0 and Var(f(X)) is finite. Similarly, let LY denote the Hilbert space of all random variables of the form g(Y ) for which E(g(Y )) = 0 and Var(g(Y )) is finite. For any

2 f = f(X) ∈ LX , let Af = E[E(f(X)|Y )|X]. (3.8)

Then (3.6) can be written as

2 Af0 = S f0.

R´enyi (1959) continues his work by investigating the transformation A. He observes that A

2 is a bounded self-adjoint transformation of LX , and it is positive definite. We present some of the details below.

2 2 Define the inner product for f1 = f1(X) ∈ LX and f2 = f2(X) ∈ LX by

hf1, f2i = E(f1(X)f2(X)),

2 and for f ∈ LX , define kfk = hf, fi1/2 = [Var(f(X))]1/2. 30 Then we have

kAfk = [Var[E(E(f(X)|Y )|X)]]1/2 · ¸ Var[E(f(X)|Y )] 1/2 = [Var[E(f(X)|Y )]]1/2 = Var(f(X)) Var(f(X)) 1/2 1/2 = [Var(f(X))KY (f(X))] ≤ [Var(f(X))] = kfk.

2 Thus, Af is a bounded linear transformation of LX . Note that the last inequality follows from the fact that KY (f(X)), the correlation ratio (see Section 2.2.1) of f(X) on Y , has range [0,1].

2 On the other hand, for f1, f2 ∈ LX we have

hAf1, f2i = E(Af1 · f2) = E[E(E[f1(X)|Y ]|X)f2(X)]

= E[E(f2(X)E[f1(X)|Y ]|X)] = E[f2(X)E(f1(X)|Y )]

= E[E[f2(X)E(f1(X)|Y )]|Y ] = E[E(f1(X)|Y )E(f2(X)|Y )].

Interchanging f1 and f2, we have

hAf1, f2i = hf1, Af2i, (3.9)

2 thus A is a bounded self-adjoint transformation of LX . By (3.9) we also have

hAf, fi = E[E2(f(X)|Y )] ≥ 0, thus A is positive definite.

2 2 Now we are ready for the computation of S. For any f ∈ LX and g ∈ LY with Var(f(X)) = Var(g(Y )) = 1, we have by Cauchy-Schwarz inequality

ρ2(f(X), g(Y )) = E2(f(X), g(Y )) = E2[E(f(X)g(Y )|Y )] 31 = E2[g(Y )E(f(X)|Y )] ≤ E(g2(Y ))E[E2(f(X)|Y )]

= E[E2(f(X)|Y )] = hAf, fi.

Then we have S2 ≤ λ, (3.10)

2 where λ = supf hAf, fi, where the supremum is taken over all f ∈ Lx satisfying kfk = 1. Now, letting g(Y ) = E(f(X)|Y ) we have

Var(g(Y )) = E(g(Y )E(f(X)|Y )) = E(f(X)g(Y )) ≤ S(Var(g(Y ))1/2, which implies that [Var(g(Y ))]1/2 ≤ S. Then we have

hAf, fi = E[E(f(x)|Y )E(f(X)|Y )] = E[g(Y )E(f(X)|Y )]

= E(f(X)g(Y )) ≤ S[Var(g(Y ))]1/2 ≤ S2, thus, λ ≤ S2. (3.11)

Combining (3.10) and (3.11) we have

S2 = λ = suphAf, fi, (3.12) f

2 where the supremum is taken over all f ∈ Lx satisfying kfk = 1.

Definition An operator T on a Hilbert space is said to be completely continuous if it satisfies the following property: if xn ∈ H is a bounded sequence, then there exists a subsequence

xnk such that T xnk converges to some element y ∈ H as k → ∞.

Since A is a bounded self-adjoint transformation, it is known that, if A is completely continuous, then λ is the greatest eigenvalue of A, and there exists an eigenfunction belonging 32 to the eigenvalue λ. R´enyi (1959) summarizes these observations in the following theorem.

Theorem 3.1 (R´enyi) If the transformation A defined in (3.8) is completely continuous,

then the maximal correlation between X and Y is attained for f0(X) and g0(Y ) where f0

2 2 is an eigenfunction belonging to the greatest eigenvalue S = S (X,Y ) of A and g0(Y ) =

−1 S E(f0(X)|Y ).

The condition that A is completely continuous is generally hard to verify. Therefore R´enyi (1959) gives another theorem:

Theorem 3.2 (R´enyi) If the dependence between X and Y is regular (see Section 2.2.2,

page 19) and the mean square contingency is finite, then the transformation A is completely continuous and thus the maximal correlation can be attained.

Proof See R´enyi (1959).

3.2 Maximal Correlation in Case of Contingency Ta-

bles

We use R´enyi’s (1959) result to explicitly define the maximal correlation for two-dimensional

contingency tables. Let the categorical variables X and Y take values α1, ..., αI and β1, ..., βJ respectively. Without loss of generality, suppose I ≤ J. Consider the cross-classification defined in Section 2.1.1, where the cell probabilities of the joint distribution is given in matrix

{πij}. Assume that the matrix {πij} is positive, in other words there are no structural zeroes

0 0 in the contingency table. Let IX = (1α1 (X), ..., 1αI (X)) and IY = (1β1 (Y ), ..., 1βJ (Y )) , where 1 denotes the indicator function. Since X and Y can take only a finite number of outcomes, the functions f and g in their most general forms can be written as

0 f(X) = a IX , (3.13)

0 g(Y ) = b IY , (3.14) 33 0 0 where a = (a1, ..., aI ) , b = (b1, ..., bJ ) , and ai and bj are arbitrary real numbers for i = 1, ..., I

and j = 1, ..., J. Our task is to find f0 and g0 such that the maximal correlation between X and Y is attained. This is equivalent to finding vectors a and b such that the maximal

correlation is attained. We begin with writing the transformation (3.8) explicitly.

0 Let ri = (πi1, ..., πiJ ) for i = 1, ..., I, which is the transpose of the ith row of {πij}.

0 Similarly, let cj = (π1j, ..., πIj) for j = 1, ..., J, which is the jth column of {πij}. Then we have

1 0 Uj := E(f(X)|Y = βj) = cja π·j

0 for j = 1, ..., J. Let U = (U1, ...UJ ) . Then

1 0 Vi := E [E(f(X)|Y )|X = αi] = riU πi·

0 for i = 1, ..., I. Let V = (V1, ..., VI ) . Note that V is the right hand side of equation (3.8).

Proposition 3.1 The vector V can be factored such that V = Aa, where A is an I × I matrix with the general term XJ π π A = kr lr , (3.15) kl π π r=1 k· ·r where k, l = 1, ..., I.

Proof     1 0 1 r U (π11U1 + π12U2 + ··· + π1J UJ )  π1· 1   π1·       .   .  V =  .  =  .      1 0 1 r U (πI1U1 + πI2U2 + ··· + πIJ UJ ) πI· I πI·  ³ ´  1 π11 c0 a + π12 c0 a + ··· + π1J c0 a  π1· π·1 1 π·2 2 π·J J     .  =  .   ³ ´  1 πI1 c0 a + πI2 c0 a + ··· + πIJ c0 a πI· π·1 1 π·2 2 π·J J 34  ³ ´  π11 π12 π1J (π11, π21, . . . , πI1) + (π12, π22, . . . , πI2) + ··· + (π1J , π2J , . . . , πIJ ) a  π1·π·1 π1·π·2 π1·π·J   .  =  .   ³ ´  πI1 πI2 πIJ (π11, π21, . . . , πI1) + (π12, π22, . . . , πI2) + ··· + (π1J , π2J , . . . , πIJ ) a πI·π·1 πI·π·2 πI·π·J   ³P P P ´ J π1rπ1r , J π1rπ2r ,..., J π1rπIr a  r=1 π1·π·r r=1 π1·π·r r=1 π1·π·r     .  =  .  = Aa.  ³P P P ´  J πIrπ1r , J πIrπ2r ,..., J πIrπIr a r=1 πI·π·r r=1 πI·π·r r=1 πI·π·r

Therefore, in the case of contingency tables, the transformation A in (3.8) is represented

by the matrix A. Now we are ready to compute the maximal correlation between X and Y .

We will also compute the vectors a and b such that the maximal correlation is attained. In Section 2.2.2 (page 19), we introduced the notion of regular dependence between two variables, and we gave the definition of mean square contingency for cross classifications. In the case of contingency tables, the dependence between the response variables is always regular and the mean square contingency is finite. Therefore, by Theorem 3.2, the transfor- mation A is completely continuous and we can compute the maximal correlation between X and Y by using Theorem 3.1. According to Theorem 3.1, the largest eigenvalue of transformation A is the square of the

maximal correlation between X and Y and the corresponding eigenvectors lead to f0 and g0 such that the maximal correlation is attained. However, Theorem 3.1 has the assumption that E(f(X)) = 0 and Var(f(X)) is finite. We impose these assumptions on our calculation as follows. A stochastic matrix is a square matrix whose rows consist of nonnegative real numbers that sum to one. For k = 1, ..., I, we have

XI XI XJ XJ XI XJ πkrπlr 1 πkr 1 Akl = = πlr = πkr = 1, πk·π·r πk· π·r πk· l=1 l=1 r=1 r=1 l=1 r=1 thus A is a positive stochastic matrix. Then by the Perron-Frobenius theorem (see, for 35 example, Aldrovandi, 2001, page 47) A has a single unit eigenvalue, which is larger than

the absolute value of any other eigenvalue. Let λ1 = 1 > |λ2| ≥ · · · ≥ |λI | ≥ 0 denote the

eigenvalues of A sorted in a decreasing fashion, and let e1, e2, ..., eI denote the corresponding

column eigenvectors. Then by the Perron-Frobenius theorem λ1 = 1. Moreover, since A is a

stochastic matrix, it is easy to see that e1 = 1I , where 1I denotes the I-dimensional column

vector, all of whose components are one. Then if we set a = e1 we have

0 0 E[f(X)] = E(a IX ) = E(e1IX ) = (π1·, ..., πI·)e1 = 1,

therefore the assumption E[f(X)] = 0 is violated. Also note that, in this case Var(f(X)) = 0.

When we set a = ei for i = 2, ..., I, we have E(f(X)) = 0 and Var(f(X)) < ∞. So we discard the largest eigenvalue of A and conclude that the maximal correlation S between X and Y is the square root of the second largest eigenvector of A. Formally, we have

p S(X,Y ) = λ2. (3.16)

Now let us compute f0 and g0 such that the maximal correlation is attained. By Theorem 3.1, we have

0 f0(X) = e2IX , (3.17)

−1 i.e., we must set a = e2 in (3.13). Moreover, from Theorem 3.1, we have g0(Y ) = S E(f0(X)|Y ).

0 Let d = (d1, ..., dJ ) where

1 0 dj = E(f0(X)|Y = βj) = cje2 π·j

for j = 1, ..., J. Then we have

−1 0 g0(Y ) = S d IY , (3.18)

√ −1 −1 where S = 1/ λ2. In other words, we must set b = S d in equation (3.14). Let us 36 summarize our findings in the proposition below.

Proposition 3.2 Let the categorical variables X and Y take values α1, ..., αI and β1, ..., βJ respectively. Consider the cross-classification defined in Preliminaries, where the cell prob-

abilities of the joint distribution is given in the positive matrix {πij}. Then the population maximal correlation between X and Y is the square root of the second largest eigenvalue of

the matrix A defined in (3.15). The maximal correlation is attained when f0 and g0 are as defined in (3.17) and (3.18), respectively.

Here we may note that the eigenvalues of the matrix A are invariant with respect to

the row and column permutations of the matrix {πij}. We may also note that the maximal

correlation is the first canonical correlation between IX and IY .

Remark When we drop the assumption that the matrix of cell probabilities {πij} is positive, in other words, when we allow structural zeroes in the contingency table, the stochastic matrix A is not always positive. Therefore, in some special cases A may have more than

one unit eigenvalues. One such case is the case for which the cell πij is the only nonzero

probability in i-th row and j-th column of {πij}. In such cases the population maximal correlation is the square root of the largest non-unity eigenvalue of A.

3.2.1 Sample Maximal Correlation

The above results are population results. In practice, we want to estimate the maximal

correlation from an observed contingency table. Let {nij} denote an observed contingency table, as described in Section 2.1.1. Let Aˆ denote the estimator of A obtained by replacing ˆ πij’s in (3.15) by their maximum likelihood estimatorsπ ˆij = nij/n. Then A is an I × I matrix with general term XJ n n Aˆ = kr lr , (3.19) kl n n r=1 k· ·r

where k, l = 1, ..., I. The matrix Aˆ is well defined when the observed contingency table does

not have zero row or column sums. If a row or column sum equals zero for an observed 37 contingency table, we use the convention of replacing the zeros in the denominator of (3.19) by ε = 10−8, without changing the table dimensions for the analysis. The sample maximal ˆ correlation Sn between X and Y is the square root of the second largest eigenvalue of A. Formally, we have q ˆ Sn(X,Y ) = λ2, (3.20)

ˆ ˆ where λ2 is the second largest eigenvalue of A. The functions f0 and g0 can be computed

analogous to the above approach. Let en,2 denote the column eigenvector corresponding to ˆ 1 0 0 λ2. Let cn,j = n (n1j, ..., nIj) for j = 1, ..., J. Let dn = (dn,1, ..., dn,J ) , where

n 0 dn,j = cn,jen,2 n·j

for j = 1, ..., J. Then the functions f0 and g0, for which the sample maximal correlation is attained, are given by

0 f0(X) = en,2IX , (3.21)

−1 0 g0(Y ) = Sn dnIY , (3.22)

p −1 ˆ where Sn = 1/ λ2.

Remark When calculating the population maximal correlation, we assumed that the matrix

of cell probabilities {πij} is positive, therefore A is a positive stochastic matrix. Since one can observe empty cells in a contingency table, the stochastic matrix Aˆ is not always positive. Therefore, in some special cases, Aˆ may have more than one unit eigenvalues. One such case

is the case for which the cell nij is the only nonzero cell in i-th row and j-th column. In such cases, the sample maximal correlation is the square root of the largest non-unity eigenvalue of Aˆ. 38 3.3 Algebraic Form of Sample Maximal Correlation

Given a contingency table, one can compute the sample maximal correlation by comput- ing the matrix Aˆ and its second largest eigenvalue, as illustrated above. In practice, the eigenvalues and eigenvectors of large matrices are computed by numerical methods. In this section, we consider the symbolic computation of the eigenvalues of Aˆ and give an algebraic form of the sample maximal correlation for smaller contingency tables. We also make an in- teresting observation on the relation between the sample maximal correlation and the usual chi-squared test statistic for 2 × 2 tables. Consider an I × J contingency table for the categorical variables X and Y having I and

J levels respectively. Let nij denote the frequency counts of outcomes in the cell in row i and column j. Finding the sample maximal correlation of X and Y requires finding the eigenvalues of the I ×I matrix Aˆ, which requires the solution of the characteristic polynomial of degree I − 1, since we know that the matrix Aˆ will always have a unit eigenvalue. For 2 × 2, 3 × 3 and 4 × 4 tables, we computed the sample maximal correlation algebraically. We summarize our findings here and give some of the details in Appendix B. The following quantities are of interest.

XI XJ n2 θ = ij , (3.23) 1 n n i=1 j=1 i· ·j ¯ ¯ ¯ ¯2 ¯ ¯ ¯ nik nil ¯ ¯ ¯ ¯ ¯ X X ¯ njk njl ¯ θ2 = , (3.24) ni·nj·n·kn·l i

2 For 2 × 2 contingency tables, the squared sample maximal correlation Sn is given by

2 Sn = θ1 − 1. 39 For 3 × 3 contingency tables, we have

1 h p i S2 = θ − 1 + (θ + 1)2 − 4(θ + 1) . n 2 1 1 2

For 4×4 tables, we observe the quantities θ1, θ2 and an additional term θ3 which will not be presented here as a compact form is not available due to the high volume of expressions. Based on these observations we may claim that for an I × J contingency table the sample maximal correlation is

Sn = fI (θ1, ...θI−1),

where the evaluation of fI and θI−1 for I ≥ 3 may be considered as a future project.

Relation to Pearson Chi-Square Test Statistic The classical Pearson chi-square statistic for an I × J contingency table can be written as

2 X = n··(θ1 − 1), where θ1 is as given in (3.23). Thus, for 2 × 2 contingency tables we have the identity

2 2 X = nSn.

2 For tables with higher dimension, nSn has additional terms θ2, ..., θI−1 which introduce the deviation from the X2 statistic.

3.4 Maximal Correlation Test of Independence in Two-

Dimensional Contingency Tables

For the independence hypothesis (2.1) we construct a test based on the sample maximal correlation. For large sample sizes we use the results of Sethuraman (1990). For small

samples or tables with sparseness, we consider exact inferential methods. 40 3.4.1 Large Sample Case

Sethuraman (1990) considers the maximal correlation of variables that take only a finite number of values. He also considers the estimation of maximal correlation based on a sample. This work is closely related to contingency tables, but an explicit evaluation of

sample maximal correlation is not available. Sethuraman also gives the distribution of the sample maximal correlation under the null hypothesis of independence, which is directly applicable to the case of contingency tables. Here we present his distributional results, adapted to contingency tables. Consider categorical variables X and Y having I and J levels respectively. Consider the

cross-classification of X and Y which leads to a contingency table {nij}. Without loss of

generality, assume that I ≤ J. Let n denote the sample size and let Sn denote the sample maximal correlation between X and Y .

Theorem 3.3 (Sethuraman) Assume X and Y are independent. Then the limiting dis-

2 tribution of nSn as n → ∞ is the distribution of τ1, where τ1 is the maximum eigenvalue of

W which has a Wishart distribution W (II−1,J − 1). Here Ia denotes the identity matrix of size a.

Proof See Sethuraman (1990).

Note that for 2 × k contingency tables, the Wishart distribution becomes a chi-squared distribution with J −1 degrees of freedom, which is consistent with the results of Gautam and Kimeldorf (1999) on 2×k contingency tables. Recalling our observation on 2×2 contingency tables in Section 3.3, we may also note that for 2 × 2 tables the usual chi-squared test of independence and maximal correlation test of independence are literally the same. Thus for large samples, the size α maximal correlation test of independence has the following rule:

2 If nSn > C(α), reject H0, (3.25) 41 2 if nSn < C(α), cannot reject H0,

2 where C(α) is the 100(1 − α)% point of the limiting distribution of the test statistic nSn as described in Theorem 3.3. The critical points C(α) can be obtained from Table 51 of Pearson

and Hartley (1972, page 352) (set ν = I − 1 and p = J − 1 or vice versa), which gives the percentage points of the extreme eigenvalues of a Wishart matrix. When the dimensions or the significance level of interest cannot be found on this table, one can simulate the null

2 distribution of nSn and obtain the critical values. See Table A.1 for a table of critical values of the maximal correlation test statistic.

3.4.2 Small Sample Case

When working with contingency tables with a small number of observations or sparse data, we use exact inferential methods, which were introduced in Section 2.1.5. Recall that in exact tests, a measure of dependence is needed for the ordering of contingency tables. In our approach we employ maximal correlation as the ordering criterion. When complete enumer- ation is not feasible, we use the algorithm due to Patefield (1981) to simulate contingency tables with given marginals, and approximate the p-value of the exact test.

3.5 A Numerical Illustration

In this section we illustrate the computation of sample maximal correlation from an observed contingency table. We also carry out a maximal correlation test of independence. Consider

the following table taken from Snee (1974), which presents the distribution of hair color and eye color of 264 males. Let us compute the maximal correlation between the hair color and eye color based on 42 Eye Color Hair Color Brown Blue Hazel Green Black 32 11 10 3 Brown 38 50 25 15 Red 10 10 7 7 Blond 3 30 5 8

Table 3.2: Hair color and eye color for 264 males.

this sample. We begin with computing matrix Aˆ by using (3.19). We have

   0.394022 0.313580 0.187000 0.105397       0.257698 0.437607 0.168808 0.135889  Aˆ =   .    0.330235 0.362757 0.184119 0.122867    0.260590 0.415905 0.175035 0.143970

The eigenvalues and the corresponding eigenvectors of Aˆ are

0 e1 = 1, v1 = (0.5, 0.5, 0.5, 0.5) ,

0 e2 = 0.14399, v2 = (0.7168, −0.5261, 0.1644, −0.4268) ,

0 e3 = 0.01295, v3 = (−0.1434, −0.3335, 0.3697, 0.8552) ,

0 e4 = 0.00276, v4 = (0.2297, −0.0088, −0.7877, 0.5714) .

We know that the square of the sample maximal correlation is the second largest eigenvalue √ ˆ of A. Then we have Sn = 0.14399 = 0.3794. Suppose we want to test the null hypothesis that hair color is independent of eye color.

2 To carry out the independence test, the test statistic we need is nSn = 38.015. From Table A.1 the critical values for this 4 × 4 case are 11.229, 13.137 and 17.179 for significance levels 0.1, 0.05 and 0.01 respectively. The null hypothesis of independence is rejected at all three significance levels. 43 3.6 A Related Test of Independence: Correlation Ra-

tio Test

In this section we consider a dependence measure which is a modification of maximal cor- relation. This measure can be used for constructing an independence test for contingency

tables having ordered categories with known scores. For two random variables X and Y on a given probability space, consider the following quantity

S∗(X,Y ) = max[sup ρ(f(X),Y ), sup ρ(X, g(Y ))], (3.26) f g

where the supremum is taken over all Borel-measurable functions of X and Y with finite and positive variance. We first present a convenient computational formula for S∗. Then we consider calculating S∗ in the case of two-dimensional contingency tables. We begin with showing that one can always find functions f and g such that the maximum in (3.26) is attained. Without loss of generality, consider only functions f such that f(X) has zero expectation and unit variance, and suppose that E(Y ) = 0 and Var(Y ) = 1. By the Cauchy-Schwarz inequality

ρ(f(X),Y ) = E(f(X)Y ) = E(f(X)E(Y |X))

1/2 ≤ [Var(E(Y |X))] = KX (Y ), (3.27)

where KX (Y ) is the correlation ratio of Y on X, as defined in (2.14). With similar arguments as the ones used in establishing inequality (3.2), the equality in (3.27) holds if and only if

1/2 f0(X) = E(Y |X)/[Var(E(Y |X))] . Then we have

sup ρ(f(X),Y ) = KX (Y ). (3.28) f 44 Similarly, we have

sup ρ(X, g(Y )) = KY (X), (3.29) g

and thus

∗ S (X,Y ) = max(KX (Y ),KY (X)). (3.30)

Recall from Section 2.2.1 that this quantity is the same as K(X,Y ), the dependence measure given in (2.15), which was first studied by R´enyi (1959). From R´enyi’s work, we know that S∗ satisfies the postulates B, C, E and G among the seven postulates given in Section 3.1.1. We now compute S∗ for two-dimensional contingency tables. Let the categorical variables

X and Y take values α1, ..., αI and β1, ..., βJ respectively. In this section we suppose that X and Y are ordered categories with known values. Consider the cross-classification defined in

section 2.1.1, where the cell probabilities of the joint distribution are given in matrix {πij}.

0 We begin with computing KX (Y ). Let ri = (πi1, ..., πiJ ) for i = 1, ..., I, and let cj =

0 0 0 (π1j, ..., πIj) for j = 1, ..., J. Let a = (α1, ..., αI ) and b = (β1, ..., βJ ) . Let pr = (π1·, ..., πI·)

and pc = (π·1, ..., π·J ). Let

1 0 ui = E(Y |X = αi) = b ri, (3.31) πi·

for i = 1, ..., I. Then

0 E(Y |X) = u IX , (3.32)

0 0 2 where u = (u1, ...uI ) and IX = (1α1 (X), ..., 1αI (X)) . For a given vector k, let k denote the element by element multiplication of k by itself. Then we have

0 2 2 0 0 0 2 2 Var(E(Y |X)) = E[(u IX ) ] − E (u IX ) = pru − (pru ) . (3.33)

Moreover,

2 2 0 2 0 2 Var(Y ) = E(Y ) − (E(Y )) = pcb − (pcb) . (3.34) 45 Finally, we have

· ¸ 1 · ¸ 1 2 0 2 0 2 2 Var(E(Y |X)) pru − (pru) KX (Y ) = = 0 2 0 2 . (3.35) Var(Y ) pcb − (pcb)

1 0 Now let us compute KY (X) similarly. Let vj = E(X|Y = βj) = a cj for j = 1, ..., J, π·j 0 and let v = (v1, ..., vJ ) . Then with similar calculations we have

· ¸ 1 · ¸ 1 2 0 2 0 2 2 Var(E(X|Y )) pcv − (pcv) KY (X) = = 0 2 0 2 . (3.36) Var(X) pra − (pra)

Thus (· ¸ 1 · ¸ 1 ) 0 2 0 2 2 0 2 0 2 2 ∗ pru − (pru) pcv − (pcv) S (X,Y ) = max 0 2 0 2 , 0 2 0 2 . (3.37) pcb − (pcb) pra − (pra)

For an observed contingency table {nij} with n observations, the sample equivalent of

∗ (3.37), Sn(X,Y ), can be obtained by replacing the quantities pc, pr, u, and v by their natural

estimators which can be obtained by replacing all πij values by their maximum likelihood

∗ estimates, nij/n. We do not have large sample distributional results for Sn and this can

∗ be considered as a future project. However, exact tests based on Sn can be constructed. In Section 4.4, we present some empirical results for the performance of the exact test, for

∗ ∗ which Sn is used as the ordering criterion. Since S is related to correlation ratio, we will refer to this test as the correlation ratio test.

3.7 An Example: Lissajous Curve Case

In this section we evaluate the maximal correlation for a case where the two continuous

random variables are uncorrelated but dependent. We first introduce the example and show that the correlation between the variables is zero. We then calculate the maximal correlation between these two variables, and observe that the maximal correlation is one.

Let the random variable W have uniform distribution over the interval [0, 2π]. Let X =

sin aW and Y = sin bW where a and b are integers and a 6= b. The variables X and Y are 46 clearly dependent. The plot of the relationship between the variables is a special case of the well-known Lissajous curve, therefore we will refer to this example as the Lissajous curve case. See Figure 3.1 for some illustrations of these curves for several a and b values. Figure

A.1 presents more illustrations.

a=1, b=2 a=2, b=3 1 1

0.5 0.5

Y 0 Y 0

−0.5 −0.5

−1 −1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 X X

a=1, b=4 a=5, b=6 1 1

0.5 0.5

Y 0 Y 0

−0.5 −0.5

−1 −1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 X X

Figure 3.1: Plots of the Lissajous curve for several a and b.

Proposition 3.3 The variables X and Y are uncorrelated.

R 2π −1 Proof Note that E(X) = 0 sin(aw)(2π) dw = 0. Similarly, E(Y ) = 0. Then we have

ρ(X,Y ) = 1 E[(X − E(X))(Y − E(Y ))] σX σY = 1 E(XY ) = 1 E(sin(aW ) sin(bW )) σX σY σX σY R = 1 2π sin(aw) sin(bw)(2π)−1dw = 0. σX σY 0 47 In order to compute the maximal correlation between X and Y , we use the Chebyshev polynomials of the first kind, which are defined by

Tn(x) = cos(n arccos x), (3.38) where x ∈ [−1, 1], and n is the degree of the polynomials. Replacing x by cos θ, where θ ∈ R, we have

Tn(cos θ) = cos(nθ). (3.39)

In order to see that cos(nθ) is a polynomial of degree n in cos(θ), one can use de Moivre’s formula, given by (cos θ + i sin θ)n = cos(nθ) + i sin(nθ). Here, cos(nθ) is the real part of the right hand side of the equation, and the real part of the left hand side of the equation is a polynomial in cos θ and sin θ, where all powers of sin θ are even and thus can be replaced by cos θ using the relation sin2 θ + cos2 θ = 1.

Proposition 3.4 The maximal correlation between X and Y is 1.

Proof Let Z = cos(2abW ). Then there exists a Chebyshev polynomial Ta such that

2 2 Z = cos(2abW ) = Ta(cos(2bW )) = Ta(1 − 2 sin (bW )) = Ta(1 − 2Y ). (3.40)

Similarly, there exists a Chebyshev polynomial Tb such that

2 2 Z = cos(2abW ) = Tb(cos(2aW )) = Tb(1 − 2 sin (aW )) = Tb(1 − 2X ). (3.41)

Thus Z is a function of both X and Y . The correlation of Z with itself is 1, thus, the maximal correlation between X and Y is 1.

This example will be revisited three times. In Sections 4.3.3 (example 3) and 4.4, we generate continuous random variables using this example, with some noise added in both coordinates. Then we collapse the observations into contingency tables and carry out tests 48 of independence based only on the contingency tables. Empirical results are presented in this section, which compare the power of several independence tests. We will also consider this example in Section 4.6. Having observed that both correlation coefficient and maximal correlation take extreme values in this example, one may wish to use other quantities to measure the dependence between X and Y . In Section 4.6, we carry out an empirical exploratory study for approximately calculating two other dependence measures and investigating their behaviour with respect to the changes in the constants a and b. The dependence measures we consider are the correlation ratio discussed in Section 2.2.1 (page 14), and the I-coefficient introduced by Bakirov, Rizzo and Sz´ekely (2006). This section also includes the formulas for the I-coefficient and its sample counterpart, the statistic In. 49

CHAPTER 4

Empirical Results

In this chapter we present some empirical results for the maximal correlation test of indepen- dence for two-dimensional contingency tables. We begin with a study on the limiting null

2 distribution of the test statistic nSn in Section 4.1. Then we investigate the large sample behavior of Sn in Section 4.2. The main purpose of this chapter is to compare the power performance of the maximal correlation test of independence with Pearson chi-squared and likelihood ratio tests of independence. In Section 4.3 we present a comparison of these tests, which is evaluated by a Monte Carlo power study. Section 4.4 contains empirical power com- putations for the independence test discussed in Section 3.6, as well as an independence test recently introduced by Bakirov, Rizzo and Sz´ekely (2006). Section 4.5 summarizes the results of empirical power comparisons. The last section includes an empirical exploratory study for the behavior of two dependence measures for the Lissajous curve case. The computations are carried out in R 2.4.0.

2 4.1 The Null Distribution of nSn

2 This section includes our method for obtaining the critical values of the test statistic nSn. We also present an empirical study in order to verify the asymptotic null distribution of the test statistic. 50 In Section 3.4.1 we noted that for an I × J contingency table, under the null hypothesis

2 of independence, the limiting distribution of nSn is the distribution of τ1, where τ1 is the maximum eigenvalue of W , which has Wishart distribution W (II−1,J − 1). Since a compact form for the distribution of τ1 is not available, we estimate it by simulation and obtain the critical values of the test. For the limiting distribution of τ1 for large I and J, see Johnstone (2001).

For several choices of I and J, we generated Wishart matrices with II−1 and degrees of freedom J −1 by using the rwishart function available in the bayesm package for R. The empirical 90%, 95% and 99% points of the maximum eigenvalues of the 100,000 generated Wishart matrices are given in Table A.1. These are the critical values we use for constructing maximal correlation test of independence. As noted earlier, the percentile points for the maximum eigenvalues of Wishart matrices can also be found in Table 51 of Pearson and Hartley (1972, page 352) by setting ν = I − 1 and p = J − 1. We observed that our simulations are consistent with this table.

2 In order to verify the limiting distribution of nSn under independence, we carried out several empirical studies and we present one example here. For 3 × 3 contingency tables, consider the independence loglinear model

X Y log mij = µ + λi + λj , i, j = 1, 2, 3, (4.1)

X X X Y Y Y where λ1 = 0.2, λ2 = −0.4, λ3 = 0.2, λ1 = 0.1, λ2 = −0.3, and λ3 = 0.2. In this case

2 we generated 10,000 contingency tables of size 150, and computed the test statistic nSn. The empirical distribution of the test statistic is presented in Figure 4.1, which also includes the empirical distribution of the maximum eigenvalues of 10,000 simulated Wishart matrices

W (I2, 2). This figure presents the density estimates obtained by the density function in

2 the stats package for R. A graphical analysis shows that the empirical distribution of nSn under the above independence model shows perfect agreement with the theoretical limiting 51 null distribution. We also carried out a Kolmogorov-Smirnov test for equal distributions by using ks.test function in the stats package for R. The p-value of this test turned out to be 0.32, which suggests that there is no significant difference between two distributions.

M W density 0.00 0.05 0.10 0.15 0.20

0 5 10 15 20

x

Figure 4.1: Empirical distribution of maximal correlation test statistic (M) and empirical distribution of the maximum eigenvalue of Wishart matrix with scale parameter I2 and degrees of freedom 2 (W), based on model (4.1).

4.2 Large Sample Behavior of Sn

We do not have analytical results for the convergence of the sample maximal correlation

Sn to the population maximal correlation S. However, we observed empirically that Sn approaches S at a reasonable rate. To illustrate, we give the following example.

X Consider the saturated loglinear model (2.5) for a 3 × 3 contingency table, where λi =

Y XY XY XY XY XY λi = 0 for i = 1, 2, 3 and λ11 = 0.4, λ12 = −0.2, λ13 = −0.2, λ21 = 0.8, λ22 = −0.4,

XY XY XY XY λ23 = −0.4, λ31 = −1.2, λ32 = 0.6 and λ33 = 0.6. Given the loglinear parameters, one can calculate the cell probabilities, and then calculate S. In this example, S = 0.492. For 52 several sample sizes, we estimated the mean square error of Sn based on 10,000 simulated tables. Figure 4.2 presents the behavior of the empirical mean squared error (MSE) of Sn as the sample size increases. See Table A.2 for the empirical bias, variance and MSE of Sn for sample sizes 30 to 300. Empirical MSE 0.005 0.010 0.015 0.020

50 100 150 200 250 300

n

Figure 4.2: Empirical mean squared error of Sn.

4.3 Power Comparisons

In this section we compare the empirical power of the maximal correlation test of inde- pendence with two well-known tests, namely, the Pearson chi-square test of independence and the likelihood ratio test of independence. The empirical against a fixed alternative at a fixed significance level is obtained by simulating random samples from the al- ternative distribution and finding the proportion of the samples for which the null hypothesis is rejected at that significance level. 53 4.3.1 Simulation Design

Under a given dependence structure, we determine the empirical power of each independence test by simulating contingency tables under the dependence structure, and computing the proportion of times the independence hypothesis is rejected at a given significance level α.

Recall from Section 3.4 that, for large sample sizes we use the limiting null distribution

2 of nSn, and for small sample sizes, or for tables with sparseness, we use exact inferential methods. The empirical power comparisons for these two cases are carried out as follows.

Large Sample Case

1. Generate 10,000 I × J contingency tables with sample size n based on a specified dependence structure.

2 2. For each sample, compute the maximal correlation test statistic nSn using (3.20).

2 Reject the independence hypothesis at significance level α if nSn exceeds the critical value at level α. For several I and J values, the critical values are given in Table A.1 for 10%, 5% and 1% significance levels.

3. For each sample, compute the Pearson chi-square test statistic X2 using (2.2). Reject the independence hypothesis at significance level α if X2 exceeds the 100(1 − α)% percentile of the chi-squared distribution with (I − 1)(J − 1) degrees of freedom.

4. For each sample, compute the likelihood ratio test statistic G2 using (2.3). Reject the

independence hypothesis at significance level α if X2 exceeds the 100(1−α)% percentile of the chi-squared distribution with (I − 1)(J − 1) degrees of freedom.

The critical values of chi-squared distribution can be found in most of the standard statistics texts. To carry out Pearson chi-squared and likelihood ratio tests of independence, we used the loglin function available in the stats package for R. 54 Small Sample Case

1. Generate 10,000 I × J contingency tables with sample size n based on a specified dependence structure.

2. For each table, calculate the row and column sums, and find all possible contingency

tables which have the same row and column sums.

3. For each table, order all the tables obtained in Step 2 according to sample maximal

correlation Sn. Compute the p-value of the exact test based on Sn as described in Section 2.1.5. Reject the independence hypothesis at significance level α if the p-value is less than α.

4. For each table, order all the tables obtained in Step 2 according to Pearson chi-squared test statistic X2. Compute the p-value of the exact test based on X2 as described in Section 2.1.5. Reject the independence hypothesis at significance level α if the p-value is less than α.

5. For each table, order all the tables obtained in Step 2 according to the likelihood ratio test statistic G2. Compute the p-value of the exact test based on G2 as described in Section 2.1.5. Reject the independence hypothesis at significance level α if the p-value is less than α.

When the complete enumeration of the tables is not feasible, we use the r2dtable function in the stats package for R. This function uses the Patefield’s (1981) algorithm to simulate contingency tables with given marginals. Then we order these simulated tables and find the approximate p-value of the exact test as in steps 3, 4 and 5. Unless otherwise is indicated, we approximate the exact p-value based on 300 replicates. 55 4.3.2 Empirical Significance

We use the independence model (4.1) to compare the empirical significance level of maximal correlation test of independence with the nominal significance level. We generated 10,000 contingency tables using this model and carried out all three tests of independence. Table

A.3 presents the rejection percentages at nominal significance levels 10%, 5% and 1%. For all α levels considered, the empirical significance levels of maximal correlation test are consistent with the nominal significance levels, as the nominal levels are within the corresponding 95% confidence intervals.

4.3.3 Results of Power Comparisons

In this section we present the results of the empirical power comparisons for the three tests of independence. We consider five examples with different dependence structures. In the first

example, we consider a loglinear model and generate contingency tables by using (2.6), where we control the dependence between the variables by the interaction parameter λXY . In the following examples we assume that the categorical variables have an underlying continuous distribution. The empirical power curves presented in this section are obtained by smoothing the

scatter plots of sample sizes versus empirical powers. The smoothing is obtained by the loess

function in the stats package for R. For each empirical power curve, the actual sample sizes and the corresponding empirical powers are given in the indicated tables. Throughout this section and the tables in the Appendix, P stands for Pearson chi-square test of independence, L stands for likelihood ratio test of independence, and M stands for maximal correlation test of independence.

Example 1

The first example we consider is an example of saturated loglinear models given in (2.5).

Consider the 3×3 loglinear model with parameters as described in Section 4.2. We computed 56 the empirical powers of maximal correlation, Pearson chi-square, and likelihood ratio tests of independence for this case. The empirical power plots of three tests at significance level 0.05

are presented in Figure 4.3. See Table A.4 for the actual rejection percentages at significance levels 0.1, 0.05 and 0.01. For sample sizes 20 and 30, we carried out exact tests. For sample sizes 40 to 90, we used large sample results for the tests. The simulation results show that all three tests have good power and they are very similar. Likelihood ratio test is slightly more powerful. We carried out similar simulation studies for several other loglinear parameter settings and observed that maximal correlation test was comparable to the other two tests most of the time. However, we observed some cases for which likelihood ratio test was considerably more powerful. Power

P L M 40 50 60 70 80 90 100

20 30 40 50 60 70 80 90

n

Figure 4.3: Empirical power of Pearson chi-square (P ), likelihood ratio (L), and maximal correlation (M) tests of independence for Example 1. Significance level α = 0.05.

Example 2

In this example we consider the continuous variables X and Y which are centered at points uniformly distributed along the unit circle, with N(0, σ2) noise in both coordinates. Let 57 2 2 W ∼ U[0, 2π], e1 ∼ N(0, σ ) and e2 ∼ N(0, σ ). Let X = cos(W ) + e1 and Y = sin(W ) + e2. Then the distribution of (X,Y ) pairs form a circular cloud of points as described above. For several cases, we generated X and Y , collapsed them into R × R contingency tables, and carried out tests of independence based on the contingency tables. We report the empirical power comparison for the following case. Let σ = 0.4. The variables X and Y are generated as described above and they are collapsed into 6 × 6 (R = 6) contingency tables, with limits [−1.3, 1.3] on both coordinates. Any sample points outside the table range are assigned to the nearest table cell. Indepen- dence tests are carried out based on these contingency tables. The empirical power plots of three tests at significance level 0.05 are presented in Figure 4.4, and the actual rejection

percentages at significance levels 0.1, 0.05 and 0.01 are given in Table A.5. The empirical

study shows that the likelihood ratio test is more powerful for sample sizes 70 to 150. For larger samples, the maximal correlation test is as powerful as the likelihood ratio test. Power

P L M 40 50 60 70 80 90 100

100 150 200

n

Figure 4.4: Empirical power of Pearson chi-square (P ), likelihood ratio (L), and maximal correlation (M) tests of independence for Example 2. Significance level α = 0.05. 58 Example 3

In this example we revisit the Lissajous curve case discussed in Section 3.7. Let the random variable W have uniform distribution over the interval [0, 2π]. Let X = sin aW and Y = sin bW where a and b are integers and a 6= b. The variables X and Y are clearly dependent.

Recall from Section 3.7 that X and Y are uncorrelated, and their maximal correlation is one. For several cases we generated X and Y by transforming from the generated W and adding noise N(0, σ2)) on both coordinates. We then collapsed the observations into R × R contingency tables and carried out tests of independence based on the contingency tables. We refer to R as the resolution of contingency tables. Note that after adding noise and discretization, the population maximal correlation S between X and Y is no longer 1. For a given a, b, σ and R, it is possible to calculate S numerically.

Case 1

For a = 1 and b = 2, we consider the above relationship between X and Y on a Lissajous curve. Let σ = 0.03 and R = 5. For significance level α = 0.05, the empirical power plots of three tests are presented in Figure 4.5. The actual rejection percentages for significance

levels 0.1, 0.05 and 0.01 are given in Table A.6. In this example, exact tests are used where the p-value of the tests are approximated based on 300 simulated tables. We observe that maximal correlation test is more powerful in this example. Case 2 We consider the same example for a = 5 and b = 6. Here σ = 0.03 and R = 14. The

empirical power plots of three tests at significance level α = 0.05 are presented in Figure 4.6. See Table A.7 for the actual rejection percentages for significance levels 0.1, 0.05 and 0.01. The maximal correlation test is more powerful in this example, and the difference is larger

compared to Case 1. 59 Power

P L M 60 70 80 90 100

50 60 70 80 90 100 110

n

Figure 4.5: Empirical power of Pearson chi-square (P ), likelihood ratio (L), and maximal correlation (M) tests of independence for Example 3, Case 1. Significance level α = 0.05.

Example 4

Motivated by Example 3, we investigated other cases for which the underlying continuous dis- tributions are uncorrelated but dependent. The following theorem is helpful for constructing

such variables.

Theorem 4.1 (Behboodian) Let g(x) be an odd and h(x) be an even real-valued function.

If X is a symmetric random variable, then the random variables Y = g(X) and Z = h(X) are uncorrelated, provided that Y and Z are non-degenerate and all the first and second moments of Y and Z exist.

Proof See Behboodian (1978).

Let U ∼ N(0, 1) and V = |U|. By Theorem 4.1, the dependent variables U and V are uncorrelated. In this example we consider variable U and V with some noise added. As in the previous examples, we collapse the observarions into R × R contingency tables and 60

P L M Power 75 80 85 90 95 100

200 220 240 260 280 300

n

Figure 4.6: Empirical power of Pearson chi-square (P ), likelihood ratio (L), and maximal correlation (M) tests of independence for Example 3, Case 2. Significance level α = 0.05. perform independence tests based on the contingency tables. We performed empirical power comparisons for several cases and we report the following case.

2 2 Let U ∼ N(0, 1), e1 ∼ N(0, σ ) and e2 ∼ N(0, σ ), where σ = 0.8. Let X = U + e1 and

Y = |U| + e2. The variables X and Y are generated, and then they are collapsed into 6 × 6 contingency tables with limits [-2.5, 2.5] for X values, and limits [-1, 3] for Y values. Any sample point outside the range is assigned to the nearest cell. The empirical power plots of all three tests at significance level 0.05 is presented in Figure 4.7. For actual rejection percentages at significance levels 0.1, 0.05 and 0.01, see Table A.8. Empirical study shows that, likelihood ratio test is more powerful for sample sizes up to 180. For larger sample sizes, maximal correlation test is more powerful.

Example 5

In this example we consider another case for which the underlying continuous variables are dependent but not correlated. Let U ∼ U(−1, 1) and V = U 2. By Theorem 4.1, the depen- 61 Power

P L M 20 40 60 80 100

50 100 150 200 250 300 350 400

n

Figure 4.7: Empirical power of Pearson chi-square (P ), likelihood ratio (L), and maximal correlation (M) tests of independence for Example 4. Significance level α = 0.05. dent variables U and V are uncorrelated. Consider the variables U and V with some noise added. We collapse the observations into R × R contingency tables and perform indepen- dence tests based on the contingency tables. We performed empirical power comparisons for several cases and we report the following case.

2 2 2 Let U ∼ U(−1, 1), e1 ∼ N(0, σ ) and e2 ∼ N(0, σ ), where σ = 0.3. Let X = U + e1

2 and Y = U + e2. The variables X and Y are generated, and then they are collapsed into 4×4 contingency tables with limits [-1.2, 1.2] for X values, and limits [-0.3, 1.2] for Y values.

The sample points outside the range are assigned to the nearest cell. Figure 4.8 presents the empirical power plots of all three tests at significance level 0.05. The actual rejection

percentages at significance levels 0.1, 0.05 and 0.01 are presented in Table A.9. Similar to

Example 4, empirical study shows that the likelihood ratio test is more powerful for sample sizes up to 250. For larger sample sizes, maximal correlation test is slightly more powerful. 62 Power

P L M 30 40 50 60 70 80 90 100

100 150 200 250 300 350 400 450

n

Figure 4.8: Empirical power of Pearson chi-square (P ), likelihood ratio (L), and maximal correlation (M) tests of independence for Example 5. Significance level α = 0.05.

4.4 Empirical Powers of Correlation Ratio Test and I-

Test

The purpose of this section is to gain some insight on the power performances of two tests of independence, namely, the correlation ratio test discussed in Section 3.6, and the I-test introduced by Bakirov, Rizzo and Szekely (2006). The critical values for both tests are unavailable, so we construct exact tests based on these statistics. We compare the empirical power performance of these tests with the exact tests based on Pearson chi-squared, likelihood ratio, and maximal correlation statistics. The simulation design we use here is the same as the design for small sample cases described in Section 4.3.1, we only add two more ordering criteria here. Bakirov, Rizzo and Sz´ekely (2006) proposed a multivariate nonparametric test of inde-

pendence, which is based on a measure of association determined by the interpoint distances

between the sample points. This test is based on a population independence coefficient I, 63 which takes values between 0 and 1, and equals zero if and only if the variables are indepen- dent. Although their method is mainly developed for continuous variables, they provide the

following statistic for the case of two-dimensional contingency tables. Let {nij} denote the cell frequencies of an I × J contingency table. Then the test statistic is given by

PI PJ 2 2 i=1 j=1(nnij − ni·n·j) In = PI PJ . (4.2) i=1 j=1 ni·n·j(n − ni·)(n − n·j)

See Section 4.6 for the formulas of I and In for continuous random variables.

2 The statistic In can be used as the ordering criterion in the exact test algorithm described

∗ in Section 2.1.5. Similarly, the sample correlation ratio statistic Sn described in Section 3.5 can be used as the ordering criterion, and exact tests can be constructed based on both statistics. We consider the following two examples to see the power performance of these two tests.

A Loglinear Case

We consider a 3 × 3 saturated loglinear model. Exact tests are carried out based on five

2 2 ∗ statistics: X , G , Sn, In, and Sn. The empirical powers of these tests are calculated based on 1000 simulated contingency tables generated by using the loglinear model parameters given in Section 4.2. Note that this is also the case we consideed in Example 1 of Section 4.3.3. For each table generated, the exact p-values of the tests are approximated based on

300 simulated tables simulated by using Patefield’s (1981) algorithm. For significance levels

0.05 and 0.01, the rejection percentages are presented in Table A.10. In this table I stands for I-test, C stands for the correlation ratio test. We observed that the powers of all tests are close to each other, but the power of the correlation ratio test seems to have slightly smaller power.

∗ Note that the correlation ratio test statistic Sn can only be constructed for tables having

∗ ordered categories with known scores. In this example we calculated Sn assuming that 64 both row and column variables take values 1, 2 and 3. When we tried other values for the categories, we did not observe significant changes in the power of the test. We will further

∗ investigate the invariance of Sn test to the changes in the category values.

A Lissajous Curve Case

Consider the Lissajous curve case discussed in Example 3, Case 1 in Section 4.3.3. Exact

2 2 ∗ tests are carried out based on X , G , Sn, In, and Sn. The empirical powers of the tests are calculated based on 2000 simulated contingency tables. The exact p-values of the tests are approximated based on 300 simulated tables. For significance levels 0.05 and 0.01, the rejection percentages are presented in Table A.11.

We observed that the power of the exact test based on In is significantly smaller than the

2 2 exact tests based on X , G and Sn. A surprising observation was to see that the power of

∗ the exact test based on Sn turned out to be extremely low. For all sample sizes, the rejection percentage was close to the size of the test. We do not have a clear explanation for the poor

∗ performance of Sn for this case. However, we shall note that this statistic depends on the dependence measure S∗ which only satisfies the postulates B, C, E and G among the seven postulates given in Section 3.1.1.

4.5 Summary of Power Comparisons

In Section 4.3.3 we compared the empirical powers of Pearson chi-squared, likelihood ratio, and maximal correlation tests of independence. We considered several examples, and none of the tests turned out to be best against every type of dependence structure. In all examples Pearson chi-square test had the poorest power with a few exceptions in small sample sizes. When we considered loglinear models and introduced dependence to the simulated con- tingency tables by the loglinear interaction term, we observed that all three tests usually had similar power. However, there were cases such that likelihood ratio test was more powerful 65 than the other two. We then considered cases for which the categorical variables have an underlying continuous distribution. When the underlying continuous variables are uncor- related but dependent, we observed two cases (example 3) where the maximal correlation test is more powerful than the other two tests. The last two cases we considered showed an interesting pattern for the comparison of maximal correlation test and likelihood ratio test. In both cases the power of maximal correlation test was slightly lower than the likelihood ratio test for up to 80% power, then it was slightly higher. In Section 4.4 we carried out empirical power studies for the correlation ratio test and I-test. The large sample distributional results are unavailable for both tests, so we used exact methods. We considered two examples used in Section 4.3.3. For the first example, the loglinear case, both tests had good power. The I-test was as powerful as the three tests considered previously, correlation ratio test had slightly lower power. When we considered a case for which the underlying continuous variables are uncorrelated but dependent, the power of I-test dropped significantly. Moreover, we observed that the correlation ratio test was completely insensitive to this type of a dependence, and the power of this test was as low as the size of the test.

4.6 An Exploratory Study for the Lissajous Curve Case

We introduced the Lissajous curve case in Section 3.7, and discussed two measures of de- pendence for this case, namely, product moment correlation and maximal correlation. The purpose of this section is to carry out an exploratory study in order to understand the be- havior of two more measures of dependence for this case. Both dependence measures are approximately calculated using simulations. Let the random variable W have uniform distribution over the interval [0, 2π]. Let X = sin aW and Y = sin bW where a and b are integers and a 6= b. In Section 3.7 we showed that the correlation between X and Y is zero, and the maximal correlation between them is 66 one. Since both dependence measures take extreme values, it is of interest to compute other quantities to measure the dependence between X and Y . The first dependence measure we consider is the correlation ratio discussed in Section

2.2.1. For given a and b values, we generate 100,000 pairs of X and Y , which take values in the interval [-1,1]. We then divide this interval into 200 grids and treat X and Y as discrete

variables. We compute the correlation ratios KX (Y ) and KY (X) numerically by using (3.35) and (3.36), respectively. This corresponds to investigating the continuous variables in grids of size 0.01, which leads to our approximation to the correlation ratios between X and Y . The second dependence measure we consider is the I-coefficient introduced by Bakirov,

Rizzo and Sz´ekely (2006). For completeness, we present here the formulas for the I-coefficient and its sample counterpart, the statistic In.

p+q Consider a general population Z = (X,Y ) ∈ R . Let f(t, s), f1(t) and f2(s) denote the characteristic functions of random variables Z, X and Y , respectively. For complex

p q functions α defined on R × R , let kα(t, s)k denote the k · k-norm in the weighted L2-space of functions on Rp+q. Then the independence coefficient I is given by

kf(t, s) − f1(t)f2(s)k I = °p °. (4.3) ° 2 2 ° ° (1 − |f1(t)| )(1 − |f2(s)| )°

p q Let Zj = (Xj,Yj), Xj ∈ R , Yj ∈ R , j = 1, ..., n, be a random sample from this

d population. Let | · |d denote the Euclidean norm in R , and let Zkl = (Xk,Yl). Then the statistic I is given by n r 2¯z − z − z I = d , (4.4) n x + y − z

where

1 Xn z = |Z − Z | , d n2 kk ll p+q k,l=1 1 Xn Xn z = |Z − Z | , n4 kl ij p+q k,l=1 i,j=1 67 1 Xn x = |X − X | , n2 k l p k,l=1 1 Xn y = |Y − Y | , n2 k l q k,l=1 1 Xn Xn z¯ = |Z − Z | . n3 kk ij p+q k=1 i,j=1

Going back to the Lissajous curve case, for given a and b values, we generate X and Y pairs, and calculate the statistic In by using the indep.e function in energy package for R. For sample sizes larger than 150, the function indep.e is not feasible with our computational power, so we generate 100 samples of size 150, and we report the average of In values as our approximation to I-coefficient.

Empirical Results

The simulation results are given in Table A.12, and the plots of corresponding Lissajous curves are given Figure A.1. For correlation ratios, we observed that the cases we considered can be divided into two groups: the cases for which the Lissajous curve crosses itself, and the cases for which it does not. For the first group, we observed that both correlation ratios are close to 0.05. For the second group, we observed that one of the correlation ratios is close to 0.05, the other is close to one. Other than this, the correlation ratios are not sensitive to the changes in constants a and b. The observation above is not valid for the I-coefficient. For this dependence measure, the most important characteristic of a case is how much the graph wiggles. For example, in the case with a = 1 and b = 3, the I-coefficient equals 0.212, but in the case with a = 1 and b = 9, the I-coefficient equals 0.095.

It is of interest to better understand the behavior of the I-coefficient with respect to the changes in integers a and b, and we will consider this as a future project. We are cautious 68 about our current numerical results for I, since we cannot increase the sample size in the simulation study, due to our limited computational power. 69

CHAPTER 5

Conclusions

In this research we presented an independence test for two-dimensional contingency tables, which is based on maximal correlation. We explored the empirical power performance of this test and compared it with well-known independence tests. We also included an introductory chapter which summarizes the commonly used techniques for the analysis of contingency tables, and well-known measures of dependence. The main reference we used in this research is a paper by R´enyi (1959), which gives the conditions such that maximal correlation can be computed. Following R´enyi’s approach we presented a method of estimating the maximal correlation between two variables, given a contingency table. Using the sample maximal correlation, we described an independence test for the case of contingency tables. For the asymptotic null distribution of the test statistic, we used a result by Sethuraman (1990). For small sample sizes or contingency tables with sparseness, we used exact inferential methods, where we employed maximal correlation as the ordering criterion. In addition, by using a dependence measure related to maximal correlation, we con- structed another independence test, which we refer to as the correlation ratio test. This test can be used for contingency tables having ordered categories with known scores. The dependence measure leading to this test is not as attractive as the maximal correlation, in 70 the sense that this dependence measure satisfies only four of the seven postulates listed in Section 3.1.1, whereas maximal correlation satisfies all of them. This reflected directly to our simulation results, and correlation ratio test turned out to have very low power in the

Lissajous curve case. We carried out a simulation study to see the empirical power performance of the maximal correlation test and compare it with Pearson chi-squared and likelihood ratio tests of inde- pendence. When the underlying continuous variables are uncorrelated but dependent, we pointed out some cases for which the maximal correlation test appears to be more powerful. We are currently in search of other examples for which the maximal correlation test must be preferred. A natural extension of this work is the application to higher dimensional contingency tables, and we will consider this as a future project. As we pointed out in Section 2.1.4, the maximal correlation test of independence we discuss is very closely related to the independence tests based on correspondence analysis, which are claimed to severely suffer from the lack of a statistical model. The maximal correlation approach to the analysis of contingency tables can be considered as assigning scores to row and column variables so that the correlation between the scores is maximized, just like in the correspondence analysis approach. This shows that the maximization criteria in correspondence analysis is not an ad hoc method, but could have a clear interpretation. In this sense, our work gives some new insight on correspondence analysis. 71

BIBLIOGRAPHY

[1] Abrahams, J. and Thomas, J.B. (1980) Properties of the maximal correlation function. J. Franklin Inst., 310, 317-323.

[2] Aitkin, M. (1982) Review of Analysis of Categorical Data: Dual Scaling and its Appli- cations by Nishisato. Journal of the Royal Statistical Society, Ser. A, 145, 513-516.

[3] Agresti, A. (1992) A Survey of Exact Inference for Contingency Tables. Statistical Sci- ence, 7, 131-153.

[4] Agresti, A. (2002) Categorical Data Analysis, second ed., Wiley, New York.

[5] Aldrovandi, R. (2001) Special Matrices of Mathematical Physics: Stochastic, Circulant, and Bell Matrices, World Scientific.

[6] Anderson, T.W. (2003) An Introduction to Multivariate Statistical Analysis, third ed., Wiley, New York.

[7] Bakirov, N.K., Rizzo, M.L. and Sz´ekely, G.J. (2006) A multivariate nonparametric test of independence. Journal of Multivariate Analysis, 97, 1742-1756.

[8] Behboodian, J. (1978) Uncorrelated dependent random variables. Mathematics Maga- zine, 51, 303-304.

[9] Bell, C.B. (1962) and Maximal correlation as measures of depen-

dence. The Annals of , 33, 587-595. 72 [10] Benz´ecri,J.-P. (1973) L’Analyse des Donn´ees: 1. La Taxonomie: 2. L’Analyse des Correspondances, Paris, Dunod (2nd ed. 1976).

[11] Bishop, Y.M.M.,Fienberg, S.E., and Holland, P.W. (1975) Discrete Muitivariate Anal- ysis, MIT Press, Cambridge, Massachusetts.

[12] Blum, J.R., Kiefer, J., and Rosenblatt, M. (1961) Distribution free tests of independence

based on the sample distribution function. Annals of Mathematical Statistics, 32, 485- 498.

[13] Breiman, L. and Friedman, J. (1985) Estimation optimal transformations for multiple

regression and correlation (with discussion). J. Amer. Statist. Assoc., 80, 580-619.

[14] Bryc, W., Dembo, A. and Kagan, A. (2005) On the maximum correlation coefficient. Theory Probab. Appl, 49, 132-138.

[15] Cochran, W.G. (1954) Some methods of strengthening the common χ2 tests. Biometrics, 10, 417-451.

[16] Cram´er, H. (1946) Mathematical Methods of Statistics, Princeton University Press, Princeton, New Jersey.

[17] Cs´aki,P. and Fischer, J. (1963) On the general notion of maximum correlation. Magyar Tudom´anyosAkad. Mat. Kutat´oInt´ezetenkK¨ozlem´enyei (publ. Math. Inst. Hungar. Acad. Sci.), 8, 27-51.

[18] Dembo, A., Kagan, A. and Shepp, L.A. (2001) Remarks on the maximum correlation coefficient. Bernoulli, 7, 343-350.

[19] Diaconis, P. and Strumfels, B. (1998) Algebraic Algorithms for Sampling from Condi- tional Distributions. The Annals of Statistics, 26, 363-397.

[20] Fisher, R.A. (1934) Statistical Methods for Research Workers. Oliver and Boyd, Edin-

burgh (14th ed. 1970). 73 [21] Gautam, S. and Kimeldorf, G. (1999) Some results on the maximal correlation in 2 × 2 contingency tables. The American , 53, 336-341.

[22] Gebelein, H. (1941) Das statistische Problem der Korrelation als Variations - und Eigen-

werthproblem und sein Zusammenhang mit der Ausgleichsrechnung. Z. Angew. Math. Mech., 21, 364-379.

[23] Goodman, L.A. (2000) The analysis of cross-classified data: Notes on a century of progress in contingency table analysis, and some comments on its prehistory and its future, In: Statistics for the 21st Century (eds. C.R. Rao and G.J. Sz´ekely), Dekker,

New York, 189-231.

[24] Goodman, L.A. and Kruskal, W.H. (1954) Measures of association for cross- classifications. Journal of the American Statistical Association, 49, 732-764.

[25] Goodman, L.A. and Kruskal, W.H. (1979) Measures of Association for Cross-

Classifications. Springer-Verlag, New York.

[26] Haberman, S.J. (1981) Tests for independence in two-way contingency tables based on canonical correlation and on linear-by-linear interaction. The Annals of Statistics, 9, 1178-1186.

[27] Hoeffding, W. (1948) A class of statistics with asymptotically normal distribution. An- nals of Mathematical Statistics, 49, 293-325.

[28] Hollander, M. and Wolfe, D.A. (1999) Nonparametric Statistical Methods, second ed., Wiley, New York.

[29] Johnstone, I.M. (2001) On the distribution of the larges eigenvalue in principal compo-

nent analysis. The Annals of Statistics, 29, 295-327.

[30] Kendall, M.G. (1945) The treatment of ties in rank problems. Biometrika, 33, 239-251. 74 [31] Koehler, K. (1986) Goodness-of-fit tests for log-linear models in sparse contingency tables. J. Amer. Statist. Assoc. 81, 483-493.

[32] Kolmogorov, A. (1933) Grundbegriffe der Wahrscheinlichkeitsrechnung, Ergebnisse der Math. u. Grenzgebiete, Berlin.

[33] Koyak, R.A. (1987) On measuring internal dependence in a set of random variables. Ann. Statist., 15, 1215-1228.

[34] Kuriki, S. (2005) Asymptotic distribuion of inequality-restricted canonical correlation

with application to tests for independence in ordered contingency tables. J. Multivariate

Anal., 94, 420-449.

[35] Liebetrau, A.M. (2005) Measures of Association, second ed., SAGE Publications, Lon- don.

[36] Nishisato, S. (1980) Analysis of Categorical Data: Dual Scaling and its Applications. University of Toronto Press, Toronto.

[37] Novak, S.Y. (2004) On Gebelein’s correlation coefficient. Statist. Probab. Lett., 69, 299-

303.

[38] Patefield, W.M. (1981) Algorithm AS159. An efficient method of generating r ×c tables with given row and column totals. Applied Statistics, 30, 91-97.

[39] Pearson, K. (1896) Mathematical contributions to the theory of evolution, III. Regres- sion, heredity, and panmixia. Philosophical Transcriptions of the Royal Society A, 187, 253-318.

[40] Pearson, K. (1904) Mathematical contributions to the theory of evolution, XIII. On the theory of contingency and its relation to association and normal correlation. Draper’s

Co. Research Memoirs. Biometric Series, 1 (Reprinted 1948 in Karl Pearson’s Early

Statistical Papers, ed. by E.S. Pearson, Cambridge University Press, Cambridge.) 75 [41] Pearson, E.S. and Hartley, H.O. (1972) Biometrika Tables for , 2. Cam- bridge University Press.

[42] R´enyi, A. (1959) On measures of dependence. Acta Math. Acad. Sci. Hungar., 10, 441-

451.

[43] Sethuraman, J. (1990) The asymptotic distribution of R´enyi maximal correlation. Comm. Statist. Theory Methods, 19, 4291-4298.

[44] Snee, R.D. (1974) Graphical display of two-way contingency tables. The American Statistician, 28, 912.

[45] Spearman, C. (1904) The proof of measurement of association between two things. American Journal of Psychology, 15, 72-101.

[46] Stuart, A. (1953) The estimation and comparison of strengths of association in contin- gency tables. Biometrika, 40, 105-110.

[47] Tjøstheim, D. (1996) Measures of dependence and tests of independence. Statistics, 28, 249-284.

[48] Tsuchuprov, A.A. (1919) On the mathematical expectation of the moments of frequency distributions. Biometrika, 13, 185-210. 76

Appendix A

Results of the Empirical Study

I J α = 0.1 α = 0.05 α = 0.01 3 3 6.998 8.599 12.057 4 4 11.229 13.137 17.179 5 5 15.441 17.584 21.976 6 6 19.634 21.953 26.706 8 8 27.872 30.359 35.636 10 10 36.122 38.875 44.710 12 12 44.233 47.215 53.330 14 14 52.506 55.686 62.133 16 16 60.583 63.892 70.359

2 Table A.1: Critical values of nSn.

n E|Sn − S| Var(Sn) MSE(Sn) 40 0.0491 0.0126 0.0150 60 0.0336 0.0087 0.0098 80 0.0256 0.0066 0.0072 100 0.0201 0.0055 0.0058 120 0.0179 0.0044 0.0047 140 0.0146 0.0034 0.0040 160 0.0121 0.0034 0.0035 180 0.0105 0.0031 0.0032 200 0.0117 0.0027 0.0028 250 0.0077 0.0022 0.0022 300 0.0070 0.0018 0.0018

Table A.2: Empirical mean square error of Sn based on 10,000 simulations. 77

α = 0.1 α = 0.05 α = 0.01 n PLM PLM PLM 50 10.5 13.3 9.8 4.7 7.14 4.72 0.6 1.3 0.7 60 9.5 11.9 9.3 4.5 6.3 4.4 0.8 1.4 0.8 70 9.9 11.9 9.7 4.8 6.2 4.6 0.8 1.3 0.8 80 9.7 11.6 9.9 4.7 6.0 4.6 0.9 1.4 0.8 90 10.1 11.4 10.0 4.8 5.7 4.9 0.9 1.3 0.8 100 9.5 10.7 9.3 4.8 5.6 4.6 0.8 1.1 0.9

Table A.3: Empirical significance of Pearson chi-square (P ), likelihood ratio (L), and maxi- mal correlation (M) tests of independence.

α = 0.1 α = 0.05 α = 0.01 n PLM PLM PLM 20? 51 52 52 36 39 39 13 17 16 30? 72 73 73 59 60 61 31 35 34 40 85.4 88.8 85.8 74.5 80.8 75.8 45.8 60.4 48.8 45 90.2 92.4 90.7 81.5 86.4 82.9 55.5 67.5 59.2 50 93.3 94.9 93.8 86.8 90.4 87.8 64.0 74.3 67.2 55 95.2 96.3 95.6 90.5 92.5 91.0 71.4 79.4 74.4 60 96.8 97.5 97.0 93.3 94.6 93.7 76.8 83.6 79.8 65 97.8 98.2 98.0 94.8 96.2 95.5 82.3 87.1 84.6 70 98.6 98.9 98.7 97.0 97.8 97.3 87.3 90.8 88.9 75 99.3 99.4 99.4 98.0 98.5 98.2 90.2 93.2 91.8 80 99.4 99.5 99.4 98.4 98.8 98.6 92.2 94.6 93.7 90 99.8 99.8 99.8 99.4 99.5 99.4 96.1 97.3 96.9

Table A.4: Empirical power of Pearson chi-square (P ), likelihood ratio (L), and maximal correlation (M) tests of independence for Example 1. Sample sizes with ? indicate that exact tests are used. 78

α = 0.1 α = 0.05 α = 0.01 n PLM PLM PLM 70 50.1 73.8 53.2 35.9 59.8 38.6 13.9 31.5 15.1 80 58.8 78.5 62.6 44.3 66.2 48.6 18.9 39.3 22.2 90 65.5 83.3 71.0 50.6 72.4 58.0 25.3 45.9 30.4 100 71.3 85.9 78.1 58.2 76.5 66.4 30.9 52.7 39.6 110 79.2 90.8 89.7 66.9 83.2 74.4 39.7 61.0 50.0 120 82.8 92.0 88.7 71.8 85.8 81.0 45.3 65.5 57.8 130 86.8 94.2 92.1 77.6 88.7 85.6 51.7 71.3 65.8 140 90.2 95.9 94.1 82.4 91.5 89.5 59.0 76.1 73.1 150 92.6 96.8 96.2 85.6 93.3 92.3 64.3 80.0 78.3 160 94.0 97.2 97.2 88.8 94.7 94.6 70.6 83.7 83.2 170 96.0 98.3 98.1 91.4 96.4 96.1 75.8 86.9 87.3 180 96.8 98.8 98.8 93.4 97.1 97.5 80.5 89.7 90.4 190 97.9 99.2 99.1 95.5 98.0 98.1 83.9 92.2 93.0 200 98.5 99.5 99.5 96.2 98.5 98.8 86.8 93.2 94.5 210 99.0 99.7 99.7 97.6 99.1 99.2 90.4 95.4 96.5 220 99.3 99.7 99.8 98.1 99.3 99.5 91.9 96.2 97.4

Table A.5: Empirical power of Pearson chi-square (P ), likelihood ratio (L), and maximal correlation (M) tests of independence for Example 2.

α = 0.1 α = 0.05 α = 0.01 n PLM PLM PLM 50 75.4 73.4 82.0 60.9 59.2 69.4 35.4 33.8 47.4 55 77.9 77.1 83.7 65.2 64.1 73.5 39.8 38.2 51.6 60 84.8 85.0 88.3 73.6 74.3 80.0 49.4 47.9 62.0 65 91.3 89.6 92.5 81.3 81.3 87.3 57.1 56.7 69.1 70 92.7 92.6 95.2 88.4 84.6 89.8 63.8 63.2 74.5 75 93.6 94.6 95.5 86.9 88.9 91.3 68.2 69.5 77.8 80 96.8 97.6 98.4 92.1 93.0 95.8 75.3 76.7 86.3 85 97.1 98.2 98.6 93.9 95.0 96.0 78.1 80.5 86.7 90 98.6 98.8 99.0 95.8 96.4 97.8 84.3 85.4 90.9 95 99.3 99.6 99.1 97.0 97.8 98.3 84.9 88.9 91.1 100 99.2 99.3 99.4 98.2 98.5 98.7 90.5 92.1 95.5 110 99.7 99.8 99.6 99.1 99.5 99.4 94.4 95.9 97.5

Table A.6: Empirical power of Pearson chi-square (P ), likelihood ratio (L), and maximal correlation (M) tests of independence for Example 3, Case 1. 79

α = 0.1 α = 0.05 α = 0.01 n PLM PLM PLM 200 68.0 77.2 83.5 54.4 60.5 72.5 30.2 26.5 44.2 210 70.9 82.1 86.5 57.6 67.1 76.6 32.3 32.6 50.9 220 75.1 87.0 90.5 62.7 75.1 82.8 36.5 43.5 59.7 230 77.7 90.8 93.5 65.7 81.0 88.1 39.4 50.4 66.8 240 79.2 93.4 95.0 68.2 85.0 90.4 42.6 56.8 72.5 250 83.6 95.3 96.9 73.2 89.3 93.5 47.1 65.0 78.7 260 85.6 97.0 97.9 75.9 91.7 95.3 50.6 70.5 84.1 270 87.8 97.9 98.6 78.2 94.1 96.7 54.3 76.5 88.1 280 89.0 98.2 99.0 79.8 95.7 97.5 57.2 80.7 90.7 290 90.4 99.0 99.3 82.5 96.6 98.5 60.8 83.8 93.4 300 91.7 99.5 99.7 84.1 97.7 99.2 64.0 86.9 95.3

Table A.7: Empirical power of Pearson chi-square (P ), likelihood ratio (L), and maximal correlation (M) tests of independence for Example 3, Case 2.

α = 0.1 α = 0.05 α = 0.01 n PLM PLM PLM 50 30.5 39.9 25.2 19.4 24.8 15.4 6.8 6.9 4.56 75 41.7 54.4 38.4 29.7 39.7 27.1 11.6 16.2 11.1 100 52.2 63.8 52.0 38.7 50.3 39.1 18.1 24.4 19.6 125 62.6 72.1 64.4 49.5 59.7 52.4 26.5 32.9 29.9 150 71.9 78.8 74.0 59.5 67.6 63.8 34.7 41.8 41.1 175 78.8 83.6 81.2 67.9 73.5 73.2 44.1 49.8 52.1 200 83.8 87.0 87.8 75.1 79.3 80.9 53.0 57.4 61.9 225 89.4 90.9 92.2 81.7 84.5 86.9 62.1 65.3 71.1 250 92.6 93.9 94.9 87.2 88.9 91.3 69.8 72.4 79.2 275 95.1 95.6 96.9 90.7 91.7 94.6 76.7 78.4 85.2 300 90.1 96.6 97.9 93.2 93.5 96.1 81.2 82.3 88.9 325 97.5 97.6 98.8 95.2 95.4 97.5 85.6 85.8 92.4 350 98.6 98.5 99.3 96.9 96.8 98.5 89.6 89.6 95.1 375 98.9 98.9 99.4 97.9 97.8 99.1 92.4 91.9 96.6 400 99.4 99.3 99.8 98.5 98.6 99.6 94.3 93.9 97.8

Table A.8: Empirical power of Pearson chi-square (P ), likelihood ratio (L), and maximal correlation (M) tests of independence for Example 4. 80

α = 0.1 α = 0.05 α = 0.01 n PLM PLM PLM 100 42.0 55.2 40.9 29.1 40.3 28.5 11.7 16.5 11.1 125 52.4 65.1 54.0 39.3 51.2 40.6 17.6 25.5 19.5 150 60.1 71.7 62.9 46.9 58.9 50.9 24.1 33.1 27.4 175 69.0 78.3 73.3 56.5 67.4 61.5 37.8 41.2 36.9 200 76.5 83.6 81.1 65.1 73.6 71.2 39.3 49.8 47.9 225 81.5 87.9 86.2 71.7 79.2 78.4 47.5 56.7 57.9 250 85.5 90.0 89.9 76.8 82.9 83.7 54.2 62.5 64.9 275 89.5 92.9 93.1 82.1 86.9 87.9 61.3 69.4 72.5 300 92.3 94.9 95.8 86.4 90.3 92.1 68.1 74.7 79.1 325 94.9 96.3 96.8 90.3 93.1 94.3 74.9 80.5 84.9 350 96.0 97.6 98.1 92.3 94.5 96.2 79.9 83.9 88.3 375 97.4 98.3 98.9 95.0 96.5 97.6 84.4 87.9 92.0 400 98.3 98.8 99.3 96.2 97.2 98.4 87.9 90.9 94.2 425 98.9 99.3 99.6 97.7 98.3 99.0 91.6 93.7 96.0 450 99.4 99.6 99.6 98.3 98.9 99.4 93.4 95.1 97.5

Table A.9: Empirical power of Pearson chi-square (P ), likelihood ratio (L), and maximal correlation (M) tests of independence for Example 5.

α = 0.05 α = 0.01 n PLMIC PLMIC 30 61 61 62 62 59 34 36 36 36 34 35 68 69 70 70 67 43 48 46 48 42 40 75 74 78 78 74 51 55 56 56 49 45 83 83 84 84 80 62 64 65 67 58 50 86 86 87 87 83 68 70 71 71 63 55 90 91 92 91 89 77 77 79 78 71 60 94 94 95 95 92 80 82 83 83 76 65 94 94 95 95 93 85 85 86 86 80 70 96 96 96 97 95 88 89 89 89 83

Table A.10: Empirical power of Pearson chi-square (P ), likelihood ratio (L), maximal cor- relation (M), I-test (I), and correlation ratio (C) tests of independence for the loglinear case. 81

α = 0.05 α = 0.01 n PLMIC PLMIC 60 74 75 82 47 5 49 49 61 26 1 65 81 82 85 52 6 61 60 73 29 1 70 89 85 89 49 5 62 64 75 28 2 75 88 88 90 53 4 68 70 79 30 2 80 92 93 94 60 4 75 77 86 35 1 85 92 93 95 57 3 77 80 86 35 1 90 95 96 97 68 4 84 85 90 45 1 95 97 98 98 67 3 87 91 94 43 1 100 98 98 99 70 4 90 92 96 47 1

Table A.11: Empirical power of Pearson chi-square (P ), likelihood ratio (L), maximal cor- relation (M), I-test (I), and correlation ratio (C) tests of independence for the Lissajous curve case.

a b KX (Y ) KY (X) I 1 2 0.050 0.047 0.136 1 3 0.989 0.048 0.212 1 4 0.050 0.044 0.092 1 5 0.998 0.050 0.135 1 6 0.047 0.051 0.086 1 7 0.998 0.054 0.111 1 8 0.048 0.048 0.083 1 9 0.999 0.075 0.095 2 3 0.046 0.052 0.095 2 5 0.047 0.044 0.089 2 7 0.049 0.051 0.086 2 9 0.055 0.053 0.079 3 4 0.051 0.050 0.088 3 5 0.051 0.049 0.105 3 7 0.051 0.050 0.099 3 8 0.051 0.052 0.079 4 5 0.052 0.049 0.081 4 7 0.053 0.049 0.089 4 9 0.053 0.051 0.085 4 11 0.049 0.053 0.074

Table A.12: Approximations for the I-coefficient, KX (Y ), and KY (X) for the Lissajous case. 82

a=1, b=2 a=1, b=3 a=1, b=4 a=1, b=5 a=1, b=6

a=1, b=7 a=1, b=8 a=1, b=9 a=2, b=3 a=2, b=5

a=2, b=7 a=2, b=9 a=3, b=4 a=3, b=5 a=3, b=7

a=3, b=8 a=4, b=5 a=4, b=7 a=4, b=9 a=4, b=11

Figure A.1: Lissajous curves for several a and b values. 83

Appendix B

Algebraic form of Maximal Correlation

B.1 2 × 2 Contingency Tables

Consider the matrix Aˆ for a 2 × 2 contingency table, and denote the entries of the matrix by the letters K, L, M, N.     2 2 n11 n12 n11n21 n12n22  n n + n n n n + n n   KL  Aˆ =  1· ·1 1· ·2 1· ·1 1· ·2  =   . n2 n2 n21n11 + n22n12 21 + 22 MN n2·n·1 n2·n·2 n2·n·1 n2·n·2

The eigenvalues of Aˆ are given by the solution of the equation det(Aˆ − λI) = 0, which leads to the following characteristic polynomial.    K − λ L  det(Aˆ − λI) = det   = λ2 − (K + N)λ + (KN − LM) = 0. MN − λ

Since 1 is always an eigenvalue of Aˆ, we have

(λ − 1)(λ + 1 − K − N) = 0, 84 thus 2 2 2 2 n11 n12 n21 n22 λ = K + N − 1 = + + + − 1 = θ1 − 1 n1·n·1 n1·n·2 n2·n·1 n2·n·2 ˆ is the non-unity eigenvalue of A, where θ1 is as defined in (3.23). So the sample maximal correlation for 2 × 2 contingency tables is

2 Sn = θ1 − 1. (B.1)

Note that the Pearson chi-squared test statistic can be written as

XI XJ (n − ni·n·j )2 XI XJ nn2 X2 = ij n = ij − n = n(θ − 1), ni·n·j n n 1 i=1 j=1 n i=1 j=1 i· ·j therefore, for 2 × 2 contingency tables we have

2 2 X = nSn. (B.2)

B.2 3 × 3 Contingency Tables

Consider the matrix Aˆ for a 3 × 3 contingency table, and denote the entries of this matrix by letters K,...,S.   n2 n2 n2 11 + 12 + 13 n11n21 + n12n22 + n13n23 n11n31 + n12n32 + n13n33  n1·n·1 n1·n·2 n1·n·3 n1·n·1 n1·n·2 n1·n·3 n1·n·1 n1·n·2 n1·n·3   2 2 2  ˆ  n21n11 n22n12 n23n13 n21 n22 n23 n21n31 n22n32 n23n33  A =  + + + + + +   n2·n·1 n2·n·2 n2·n·3 n2·n·1 n2·n·2 n2·n·3 n2·n·1 n2·n·2 n2·n·3  n2 n2 n2 n31n11 + n32n12 + n33n13 n31n21 + n32n22 + n33n23 31 + 32 + 33 n3·n·1 n3·n·2 n3·n·3 n3·n·1 n3·n·2 n3·n·3 n3·n·1 n3·n·2 n3·n·3

   KLM      =  NOP  .   QRS 85 The eigenvalues of Aˆ are given by the solution of the equation det(Aˆ − λI) = 0, which leads to the following characteristic polynomial.    K − λ L M    ˆ   det(A − λI) = det  NO − λ P    QRS − λ

= (K − λ)[(O − λ)(S − λ) − PR] − L[N(S − λ) − PQ] + M[NR − (O − λ)Q] = 0.

This can be rearranged as

3 2 λ + a2λ + a1λ + a0 = 0,

where

a2 = −(K + O + S)

a1 = −(PR + LN + MQ + KO + KS + OS)

a0 = (KOS − KPR + LP Q − LNS + MNR − MOQ).

Since 1 is always an eigenvalue of Aˆ, we have

2 (λ − 1)[λ + (a2 + 1)λ + (a1 + a2 + 1)] = 0, and the larger of the two non-unity eigenvalues of Aˆ is the larger root of

2 λ + (a2 + 1)λ + (a1 + a2 + 1) = 0, which is a + 1 1p λ = − 2 + (a + 1)2 − 4(a + a + 1). 2 2 2 1 2 86

Note that a2 = −θ1 and a1 = θ2, where θ1 and θ2 are as given in equations (3.23) and (3.24) respectively. Then the sample maximal correlation Sn is given by

1 h p i S2 = θ − 1 + (θ + 1)2 − 4(θ + 1) . (B.3) n 2 1 1 2 87

Appendix C

Selected R Code

C.1 R Function for Maximal Correlation

# This is a function for maximal correlation

# INPUT: a contingency table OUTPUT: n*S^2 mc.ns2<-function(C){

# identify size m=dim(C)[1] k=dim(C)[2]

# find row and column totals nidot=rowSums(C) ndoti=colSums(C) a=matrix(0,m,m) for (i in 1:m) { for (j in 1:m) { for (r in 1:k) { a[i,j]=a[i,j]+(C[i,r]*C[j,r])/(ndoti[r]*nidot[i]) }}} L=eigen(a,only.values=TRUE) s3=sort(L$values) s2=s3[m-1] ns2=sum(C)*s2 ns2 } 88

C.2 R Function for Correlation Ratio

# This is a FUNCTION which calculates MAXIMUM CORRELATION RATIO

# INPUT: (a contingenct table, x values in vector, y values in vector)

# OUTPUT: maximum correlation ratio statistic mc.ratio<-function(table,alpha,beta){

# Estimate the cell probabilities pi=table/sum(table)

# Row sum vector and Column sum vector pir=rowSums(pi) #stored as columns vector pic=colSums(pi)

# Find gamma and phi gamma=(1/pir)*(pi%*%beta) phi=(1/pic)*(t(pi)%*%alpha)

# Find the correlation ratio of Y on X thetaynum=pir%*%(gamma*gamma)-(pir%*%gamma)^2 thetaydenom=pic%*%(beta*beta)-(pic%*%beta)^2 thetay=thetaynum/thetaydenom

# Find the correlation ratio of X on Y thetaxnum=pic%*%(phi*phi)-(pic%*%phi)^2 thetaxdenom=pir%*%(alpha*alpha)-(pir%*%alpha)^2 thetax=thetaxnum/thetaxdenom

#now find the maximum correlation ratio maxrat=max(thetay,thetax) maxrat }

C.3 R Code for Table A.10

# This is the simulation program for computing Table A.10.

# For generating tables with given marginals, we use "r2dtable" of R # which is the procedure due to Patefield (1981) rm(list=ls(all=TRUE))

#necessary sources for functions 89 source("C:/ProgramFiles/R/R-2.4.0/programlar/myfunctions/maxcorratio.R") source("C:/ProgramFiles/R/R-2.4.0/programlar/myfunctions/energycont.R") source("C:/ProgramFiles/R/R-2.4.0/programlar/myfunctions/maximalcorrelation.R")

Nsim=10000 nsim=200 #this is for the tables generated

# sample size n=70 ccount1=0 mcount1=0 lcount1=0 ecount1=0 mcrcount1=0 ccount5=0 mcount5=0 lcount5=0 ecount5=0 mcrcount5=0 alpha1=0.01 alpha5=0.05

# enter the parameters of loglinear model lamxy=matrix(c(0.4,0.8,-1.2,-0.2,-0.4,0.6,-0.2,-0.4,0.6),3,3) lamx=matrix(c(0,0,0),1,3) lamy=matrix(c(0,0,0),1,3) xdim=3 ydim=3 xval=c(1:3) yval=c(1:3)

# calculate m[i,j] m=matrix(0,3,3) for (i in 1:3) { for (j in 1:3){ m[i,j]=exp(lamx[i]+lamy[j]+lamxy[i,j])}}

# calculate the multinomial probabilities pij[i,j] pij=m/sum(m)

# we must write pij as a 1x9 vector probs=matrix(c(pij[1,],pij[2,],pij[3,]),1,9)

# big simulation starts here++++++++++++++++++++ for (g in 1:Nsim) { kay=matrix(0,nsim,1) lik=matrix(0,nsim,1) ns2=matrix(0,nsim,1) energy=matrix(0,nsim,1) 90 mcr=matrix(0,nsim,1)

# ***************************************************************

# generate a multinomial sample of size n multi=sample(1:length(probs),size=n,prob=probs,replace=TRUE)

# finding contingency table C=matrix(tabulate(multi,nbins=length(probs)),xdim,ydim,byrow=TRUE)

#Calculate Row and Column sums for r2dtable======rs=rowSums(C) cs=colSums(C)

#calculate chisqr and LRT for ORIGINAL table------L=loglin(C,margin=list(1,2)) kayoriginal=L$pearson likoriginal=L$lrt

#calculate max cor for original table ns2original=mc.ns2(C)

# Energy test for the original table ------eoriginal=estat.cont(C)

# MCratio stat for the original mcroriginal=mc.ratio(C,xval,yval)

#-----end of calculations for original table------#======

#Generate tables s=r2dtable(nsim,rs,cs)

# small simulation starts here for (w in 1:nsim) {

#calculate chisqr for this table------L=loglin(s[[w]],margin=list(1,2)) kay[w]=L$pearson lik[w]=L$lrt

#------#calculate n*s^2 for this table ns2[w]=mc.ns2(s[[w]])

# Find energy for simulated table------energy[w]=estat.cont(s[[w]])

# Find MCRatio stat for simulated table 91 mcr[w]=mc.ratio(s[[w]],xval,yval)

} #-----small simulation ends------if (kayoriginal>quantile(kay,0.99)) ccount1=ccount1+1 if (ns2original>quantile(ns2,0.99)) mcount1=mcount1+1 if (likoriginal>quantile(lik,0.99)) lcount1=lcount1+1 if (eoriginal>quantile(energy,0.99)) ecount1=ecount1+1 if (mcroriginal>quantile(mcr,0.99)) mcrcount1=mcrcount1+1 if (kayoriginal>quantile(kay,0.95)) ccount5=ccount5+1 if (ns2original>quantile(ns2,0.95)) mcount5=mcount5+1 if (likoriginal>quantile(lik,0.95)) lcount5=lcount5+1 if (eoriginal>quantile(energy,0.95)) ecount5=ecount5+1 if (mcroriginal>quantile(mcr,0.95)) mcrcount5=mcrcount5+1

} #+++++++++big simulation ends here++++++++ powerchi1=ccount1/Nsim powerns21=mcount1/Nsim powerlik1=lcount1/Nsim powere1=ecount1/Nsim powermcr1=mcrcount1/Nsim powerchi5=ccount5/Nsim powerns25=mcount5/Nsim powerlik5=lcount5/Nsim powere5=ecount5/Nsim powermcr5=mcrcount5/Nsim powerchi5 powerns25 powerlik5 powere5 powermcr5 powerchi1 powerns21 powerlik1 powere1 powermcr1