The Modification of the Phi-Coefficient Reducing Its Dependence on The
Total Page:16
File Type:pdf, Size:1020Kb
c Metho ds of Psychological Research Online 1997, Vol.2, No.1 1998 Pabst Science Publishers Internet: http://www.pabst-publishers.de/mpr/ The Mo di cation of the Phi-co ecient Reducing its Dep endence on the Marginal Distributions Peter V. Zysno Abstract The Phi-co ecient is a well known measure of correlation for dichotomous variables. It is worthy of remark, that the extreme values 1 only o ccur in the case of consistent resp onses and symmetric marginal frequencies. Con- sequently low correlations may be due to either inconsistent data, unequal resp onse frequencies or b oth. In order to overcome this somewhat confusing situation various alternative prop osals were made, which generally, remained rather unsatisfactory. Here, rst of all a system has b een develop ed in order to evaluate these measures. Only one of the well-known co ecients satis es the underlying demands. According to the criteria, the Phi-co ecientisac- companied by a formally similar mo di cation, which is indep endent of the marginal frequency distributions. Based on actual data b oth of them can b e easily computed. If the original data are not available { as usual in publica- tions { but the intercorrelations and resp onse frequencies of the variables are, then the grades of asso ciation for assymmetric distributions can b e calculated subsequently. Keywords: Phi-co ecient, indep endent marginal distributions, dichotomous variables 1 Intro duction In the b eginning of this century the Phi-co ecientYule 1912 was develop ed as a correlational measure for dichotomous variables. Its essential features can b e quickly outlined. N elements of twovariables i and j fall into the classes "+" and "" with frequencies n , u = N n , n and u = N n , resp ectively. Consequently, i i i j j j within a pair of variables four dyadic events may o ccur: ++; +; + and . The corresp onding joint frequencies are denoted by the initial letters of the alphab et a; b; c and d. The parameters are usually presented in a fourfold table tab.1. The degree of coherence between the two variables is de ned as the quotient of two frequencies. The numerator represents the di erence of the pro ducts of Table 1: Fourfold table of the Phi-co ecient for two binary variables i and j with classes + and . j + + a b n i i c d u i n u N j j Peter V. Zysno: Mo di cation of the Phi-co ecient 42 Table 2: Phi-correlations of 6 Items with 3 levels of diculty 1 2 3 4 5 6 1 - 1.00 0.43 0.43 0.18 0.18 2 1.00 - 0.43 0.43 0.18 0.18 3 0.43 0.43 - 1.00 0.48 0.48 4 0.43 0.43 1.00 - 0.48 0.48 5 0.18 0.18 0.48 0.48 - 1.00 6 0.18 0.18 0.48 0.48 1.00 - the congruent ++; and the incongruent +; + dyads, the denominater representing the corresp onding exp ected value: ad bc = 1 p n u n u i i j j 3 Guilford 1956 p.511 has shown that this is a pro duct-moment-co ecient. There- fore, it may also b e describ ed as the relationship of covariance and total variance, i.e. 2 1=2 1=2 = cov =s s , with cov =ad bc=N , s =[n u =N ] and s =[n u =N ] . i j i i i j j j Using relative frequencies p = n =N; q = u =N; p = n =N; q = u =N; p = a=N i i i i j j j j ij 1=2 1=2 these terms can b e replaced by cov = p p p ;s =[p q ] and s =[p q ] . In ij i j i i i j j j short, this co ecient exhibits advantages such as evidence of interpretation, simple applicability, statistical testability and equivalence to Pearson's pro duct-moment- correlation. However, there is one prop erty which has rep eatedly given rise to dis- cussion. Measures of asso ciation cf. Kubinger, 1990 in general, owe their creation to empirists' desire to makecovariation more tangible and to more precisely acco- mo date some coincidental events or facts which cannot b e grasp ed adequately by graphic or linguistic means. For this reason it is not only an ob jecti cation function that is desired, but also a descriptive one. The measure should allow substantial interpretation. For this purp ose the degree of asso ciation is usually limited to the interval [1; +1], one denoting p erfect asso ciation and zero indicating the absence there of. The Phi-co ecient satis es these claims. However, the extreme values can only b e reached within symmetric distributions. This fact may lead to p otentially signi cant misjudgment. Exempli cation of this can b e found in the resp onses to six intelligence items answered consistently by 90 sub jects; the analysis of the data mayhave resulted in the frequencies n = 80, n = 80, n = 50, n = 50, n = 20, 1 2 3 4 5 n = 20. Because of the high conformity of the answers to the mo del, the matrix of 6 intercorrelations is exp ected to show predominantly ones, except for small declines in the case of di ering levels of diculty. In fact, most of the values range from .48 to .18 tab.2. Only in the case of symmetric frequencies, i.e. n = n if ad > bc i j or n = N n , if ad < bc are the entries equal to 1:00. A factor analysis will i j showasmany factors as there are levels of dicultyFerguson, 1941. Obviously the Phi-co ecient is considerably diminished when di erences be- tween the diculty levels increase. With real data this technical e ect meshes with actual resp onse inconsistencies of the sub jects. The observer is usually unable to decide, to what degree a value b elow one is a ected by the distribution asymme- try of the variables or by some incontigency within the observed events. Hence, generally the comprehension of this measure of asso ciation is complicated in that values b elow one may b e caused by resp onse inconsistencies, levels of dicultyor b oth. These problems are not restricted to graduations of theoretical test con- structs. A symmetric partition of two sets of events will also b e the exception in natural binary variables. For instance, for mammals there is a complete biological c MPR{online 1997, Vol.2, No.1 1998 Pabst Science Publishers Peter V. Zysno: Mo di cation of the Phi-co ecient 43 asso ciation b etween sex and capabilityofchildb earing. Nevertheless there will not be a Phi-correlation of one. A p ossible explanation for this might be that some of the males showchild b earing capability,too. In reality the reduced correlation originates from the fact that some females are unable to b ear children b ecause of sex-unsp eci c reasons illness, accident. Actually, the twovariables sex and child- b earing capability do not generate equivalent classes. In this case the reasons for the diminished correlation will b e readily elucidated as the result { contradictory to exp ectation { will call on inquiries. However, such previous knowledge will not often b e available as a protective instance. In contrary, for the most part, measures of asso ciation supp ort a primarily explorative pro cedure which allows for the acqui- sition of relational knowledge only after data analysis. As an example, a ctitious examination of the asso ciation b etween sex and subscription of a woman's journal can b e used, where by a correlation of = :50 mayhave b een found. It mightbe concluded that, although this magazine is more often read bywomen, it is also read by men. Without access to the original data there is little reason to distrust this interpretation. However a more precise insp ection of the data would reveal that the sample of 1000 individuals was made up on the one hand of 500 female and 500 male sub jects, and on the other hand of 800 non-subscrib ers and 200 exclusively female subscrib ers. The cells of the fourfold table might contain the following frequencies: a=200, b=300, c=0, d=500. Actually a very strong asso ciation b etween sex and magazine subscription is given: The subscrib ers are exclusively female, male sub- jects do not take the journal. This fact should b e expressed bya = 1, but the Phi-co ecient fails to do so. Is it p ossible to formulate an alternative, p erhaps complementary measure of asso ciation? Logic o ers an access. A p erfect Phi-correlation contains an equiv- alence prop osition: pa pb. If a p erson p has a feature a then he also has feature b and vice versa; if he solves task a then he will solve b as well, and if he do es not solve a, then he will be unable to solve b. Here it is presupp osed that the twovariables are dichotomized symmetrically, i.e. on b oth sets of events exist equivalent binary partitions. What, however, can b e done if the real conditions are not in accordance with these presumptions or with the test requirements? The sets of p ositive resp onses are now not in an equal, but in a partial set relation and the following implication holds: pb = pa und :pa =:pb.