Extensions to the Kruskal-Wallis Test and a Generalised Median Test
Total Page:16
File Type:pdf, Size:1020Kb
c Journal of Applied Mathematics Decision Sciences Reprints Available directly from the Editor Printed in New Zealand Extensions to the KruskalWallis Test and a Generalised Median Test with Extensions JCW RAYNER john rayneruoweduau Department of Applied Statistics University of Wol longong Northelds Avenue NSW Australia DJ BEST johnbbiomsyddfstcsiroau CSIRO Mathematical and Information Sciences PO Box North Ryde NSW Australia Abstract The data for the tests considered here may b e presented in twowaycontingency ta bles with all marginal totals xed Weshow that Pearsons test statistic X P for Pearson may P b e partitioned into useful and informative comp onents The rst detects lo cation dierences b e tween the treatments and the subsequent comp onents detect disp ersion and higher order moment dierences For KruskalWallistyp e data when there are no ties the lo cation comp onent is the KruskalWallis test The subsequent comp onents are the extensions Our approach enables us to generalise to when there are ties and to when there is a xed numb er of categories and a large numb er of observations We also prop ose a generalisation of the wellknown median test In this reduces to the usual median test statistic situation the lo cationdetecting rst comp onentofX P when there are only two categories Subsequent comp onents detect higher moment departures from the null hyp othesis of equal treatmen t eects Keywords Categorical data Comp onents Nonparametric tests Orthonormal p olynomial Twoway data Intro duction The idea of decomp osing a test into orthogonal contrasts as in the analysis of variance has long b een appreciated by statisticians as a way of making hyp othesis tests more informative In the authors smo oth go o dness of t work see Rayner and Best a similar approachis pursued Omnibus test statistics are par titioned into smo oth comp onents We dene the comp onents of a test statistic to be asymptotically pairwise indep endent with each asymptotically having the chisquared distribution and suchthat their sum gives the original test statistic ts provide powerful directional tests and permit a convenient and The comp onen informative scrutinyofthe data This approachis applied to Sp earmans test in Best and Rayner and Rayner and Best a Rayner and Best b gaveanoverview of this approach applied to several commonly used nonparametric tests including the Friedman and Durbin tests Data for a generalisation of the median test that we subsequently prop ose and for the KruskalWallis test both with and without ties may be presented in the form of twoway tables with xed marginal totals Wederive the covariance matrix JCW RAYNER AND DJ BEST of entries in such tables and then partition a multiple of X into comp onents that P detect lo cation and higher moment dierences b etween rows For KruskalWallistyp e data when there are no ties the lo cation comp onentis the KruskalWallis test Our approach enables us to generalise to when there are ties and to when there is a xed number of categories and a large number of observations We also prop ose a generalisation of the wellknown median test The lo cation detecting rst comp onentofX reduces to the usual median test statistic P when there are only two categories Using more categories allows comp onents other than this lo cation comp onent to b e calculated These additional comp onents that detect disp ersion and higher moment eects are not available when using the usual median test The structure of this pap er is as follows In the next section the mo del for two way contingency tables with xed marginal totals is given and Pearsons X is P derived as a test statistic for the null hyp othesis of like rows In section three a multiple of X is partitioned into comp onents The material in section and P will b e familiar to many readers but is necessary background for the new work In section four it is shown that when there are no ties the rst comp onent is the usual KruskalWallis statistic The nonlo cation detecting comp onents are our extensions Section ve generalises the treatment to when there are ties Section six intro duces a generalisation of the usual median X test whichisthus identied as a lo cation detecting test the extensions p ermit disp ersion and other eects to b e detected A Mo del and Pearsons X Test Supp ose wehavea twoway table of counts N with i r and j c ij ely n i r and n j c The row and column totals resp ectiv i j are known constants Under the null hyp othesis of simple random sampling the likeliho o d was given byRoy and Mitra as r c r c Y Y Y Y n n n n i j ij i j i j P P in which n n n is the grand total of the observations The mo d i j i j els for tables with just one set of marginal totals xed or only the grand total xed are quite dierent from our mo del in which all row and column totals are xed See Lancaster chapter XI section pp This likeliho o d can b e expressed as a pro duct of extended or multivariate hyp ergeometric probability functions r c Y Y n n n n j ij i C C n n ij i i j taken with resp ect to the To nd moments of the N exp ectations may be ij distribution of the second row conditional on knowledge of the column sums of the EXTENDED KRUSKALWALLIS AND MEDIAN TESTS rst tworows then conditional on the column sums of the rst three rows and so on It suces to know the moments of the extended hyp ergeometric distribution Details are given in the App endix Wend E N n n n i r and j c ij i j T T T T Write N N N i r and N N N so that N is i i ic r the vector of all the cell counts The jointcovariance matrix of N and N is for i j i j n n n n n n r r s i j diag cov N N i j n n n Write f n n j c and j j n n n n r r s R diag n n The covariance matrix of N is cov N fdiag f f f gR where is j i j the direct or Kronecker pro duct See Lancaster for details ab out direct or Kronecker sums and pro ducts Now dene the standardised cell counts q Z N E N E N i r and j c ij ij ij ij T Z Z Z Z Z c r rc I the a by a identity matrix and the a byvector with every element Then a a q cov Z fI f f gR r i j p The matrix fI f f g has r latent ro ots and one latent ro ot zero The r i j latent ro ots of R are dicult to nd in general but their asymptotic limits follow from Lancaster Chapter V Lancaster showed that the quadratic form with vector the standardised cell counts and matrix essentially R is the familiar Pearson go o dness of t statistic with asymptotic distribution Hence the c latent ro ots of R are asymptotically one c times and zero once So under the null hyp othesis of simple random sampling Z has zero mean and covariance matrix cov Z which asymptotically has r c latent ro ots one and the remaining r c latent ro ots zero In the well known and often used classical mo del r and c are xed and the total coun t n The test statistic X is given by P r c X X n n n n i j i j T Z Z N X ij P n n i j Wenow conrm that our mo del leads to this test statistic Supp ose H is orthogonal and diagonalises cov Z Asymptotically we then have JCW RAYNER AND DJ BEST T H covZ H I O r c r c T T where means direct or Kronecker sum Dene Y H Z Now Z Z T Y Y in which Y by the multivariate Central Limit Theorem is asymptotically N I under the null hyp othesis of simple random sampling rc r c r c T T It follows that under the null hyp othesis X Z Z Y Y asymptotically has P the distribution r c Partitioning Pearsons Statistic We now show that X may be partitioned into comp onents the sth of which P detects sth moment departures from the null hyp othesis of similarly distributed rows treatments rc X There is some choice in dening Y The elements Y of Y are suchthat X i i P i the Y asH is not yet fully sp ecied In doing so our aim is to nd Y that can i i e one such partition rst supp ose be easily and usefully interpreted To achiev that fg j g is the set of p olynomials orthonormal on fn n g See the App endix s j for the denitions of the rst two p olynomials and the derivation of subsequent p olynomials This approach results when there are no ties in the rst comp onent b eing the KruskalWallis test Write g for the c byvector with elements g j s s Dene G by p G G G c c in which G is the rc by r matrix s g s g s G s c and s g s c c is also rc by r G c c q T n G Z The elements of Y may be considered in blo cks of Dene Y n r the sth blo ck corresp onding to the p olynomial of order s These blo cks are T T T V in which asymptotically mutually indep endent Write Y V c T T V Y Y V Y Y r c cr cr EXTENDED KRUSKALWALLIS AND MEDIAN TESTS and V all the V are r by so that c s n n T T T T X Z Z Y Y V V V V c P c n n T n X into comp onents V V s c The V are This partitions s s s P n asymptotically mutually indep endent and asymptotically N I so that r r T the V V are asymptotically mutually indep endent Explicitly wehave for s s r s c p p c X n n T A V G Z g j Z s s ij s n n j Because V involves through g a p olynomial of order s the elements of V are s s s p olynomials of order s in the elements of N Under the null hyp othesis E Z but when this is not true E V involves moments up to order s of Z So for s T V detects sth moment departures from the null hyp othesis of s r V s s similarly distributed rows treatments Three instructors assign Instructors Example See Conover p grades in ve categories according to the following