REPORT RESUMES

ED 016229 EA 001 061 ON DEPARTURES FROM INDEPENDENCE IN CROSS-CLASSIFICATIONS. BY... CASE, C. MARSTON NATIONAL CENTER FOR EDUCATIONAL (DHEW) REPORT NUMBER T*-10 PUB DATE 18 NOV 66 EDRS PRICE MF...$0.25 HC -$1.52 36P.

DESCRIPTORS-*CLASSIFICATION, *STATISTICAL DATA, *PROBABILITY, *STATISTICALANALYSIS, *STATISTICS, MODELS, CHARTS, DISTRICT OF COLUMBIA,ZMEASURE,

THIS NOTE IS CONCERNED WITH IDEAS AND PROBLEMSINVOLVED IN CROSS- CLASSIFICATION OF OBSERVATIONSON A GIVEN POPULATION, ESPECIALLY TWO-DIMENSIONAL CROSS-CLASSIFICATIONS. MAIN OBJECTIVES OF THE NOTE INCLUDE --(1)ESTABLISHMENT OF A CONCEPTUAL FRAMEWORK FOR CHARACTERIZATION AND COMPARISONOF CROSS-CLASSIFICATIONS, (2) DISCUSSION OF EXISTING METHODSFOR CHARACTERIZING CROSS - CLASSIFICATIONS, (3) PROPOSALOF A NEW APPROACH TO AND A NEW METHOD FOR CHARACTERIZINGAND MAKING INFERENCES FROM CROSS-CLASSIFICATIONS, AND (4)INDICATION OF HOW MARKOV PROCESSES CAN BE TREATED AS CROSS-CLASSIFICATIONS. THREE KINDS OF PROBABILITIES (UNCONDITIONAL,CONDITIONAL, AND JOINT) ARE DISCUSSED IN TERMS OF STATISTICALINDEPENDENCE. MEASURES OF ASSOCIATION, ESPECIALLY THE Z-MEASUREOF ASSOCIATION AND ITS RELATION TO THE PHI COEFFICIENT IN VARIOUS SITUATIONS, ARE DISCUSSED AS OFINVESTIGATING NON-INDEPENDENCE. AN EXAMPLE OF THE Z-MEASURE APPLICATION IS PRESENTED. THE PROCESS OF DERIVING JOINT PROBABILITIESFROM Z...MEASURES AND THE INTERPRETATION OF MARKOV PROCESSES AS CROSS-CLASSIFICATIONS ARE ILLUSTRATED. (HW) U.S. DEPARTMENT OF HEALTH, EDUCATION & WELFARE

OFFICE OF EDUCATION

00 THIS DOCUMENT HAS BEEN REPRODUCED EXACTLY AS RECEIVED FROM THE

(NJ PERSON OR ORGANIZATION ORIGINATING IT.POINTS OF VIEW OR OPINIONS

STATED DO NOT NECESSARILY REPRESENT OFFICIAL OFFICE OF EDUCATION rI O POSITION OR POLICY. O Usal

NATIONAL CENTER FOR EDUCATIONALSTATISTICS Division of Operations Analysis

6N DEPARTURES FROM INDEPENDENCE IN CROSS-CLASSIFICATIONS

by

C. Marston Case

Technical Note Number 10

November18, 1966

OFFICEOF EDUCATION/U.S. DEPARTMENT OF HEALTH, EDUCATION,AND WELFARE NATIONAL CENTER FOR EDUCATIONAL STATISTICS Alexander M. Mood, Assistant Commissioner

DIVISION OF OPERATIONS ANALYSIS David S. Stoller, Director t

TABLE OF CONTENTS

I. Introduction 2

II. Terminology 3 6 III. StatisticalIndependence IV. Measures ofAssociation 12 V. The Z-measure ofAssociation 14

VI. An Applicationof the Z-measure VII. The ReverseInference: Joint Probabilities fromZ-measures 25

VIII. The Interpretationof Markov Processes asCross-Classifications 28 .

References 34. I. Introduction

This note *ie concerned with ideas and problems involved in cross-classification of observations on a given population. Most of the note will be confined to the discussion of two-dimensional cross-classifications. An example of this is the two-dimensional cross-classification of a portion of a deck (population) of playing cards resulting from the classification of the cards according to suit and according to whether or not the card is a face card. Such a cross-classification would consist of a tabulation of the numbers in each of the (4 x 2 = 8) possible combinations of suit and face-or-no-face characteristics.

The main objectives of this note are: 1) to establish c conceptual framework for characterization and comparison of cross-classifi- cations;

2) to discuss existing methods for characteriza- tion of cross-classifications; 3) to propose a new approach and a new method for characterizing and making inferences from cross- classifications; 4) to indicate how Markov processes can be treated as cross-classifications.

* The author wishes to thank Stephen Clark, George Mayeske, Richard O'Brien, and Frederic Weinfeld for suggestions lads during the writing of this note. 3

II. Terminolosz The word "event" is the conventional probabil- istic term used to indicate what isobserved; we will speak of observations on a given population with the assumption that one or moreevents are observed with each observation. The generic term "event type" will be used to specify thecharacter- istic being classified in one dimension of a cross- classification. When we observe two or more events in a single observation we areobserving the joint occurrence of the givenevents. The number of (joint) events so observed is the dimension ofthe cross-classification and is also the numberof event types in the cross-classification. The suit of a card classifies it according to oneevent type while the face-or-no-face characteristicclassifies it according to another eventtype. Within each event type there are two or more"event classes"; these classes are mutuallyexclusive and exhaustive, i.e., each observation belongs toexactly one event -Class within each event type. Within the suit event type the four eventclasses are club, diamond, heart, and spade. See figure 1.

, Suit Event 1. types Event classesdiamondheart club spade / 1---"

Face-or- face

no-face no face

Figure 1 What are called"event classes withinevent types" here are called"subcategories withinattributes" by Guttman(1), p. 258 and"classifications within criteria" by Mood(2), p. 274, and"classes within polytomies" by Goodmanand Kruskal(3), and "cate- (6), gories withinvariables" by Kendalland Stuart An extensivetreatment ofcross-classifica- p. 256. 55- tions is givenby Kendall andStuart (6) in a page chapterentitled "CategorizedData."The present paper dealsexclusively withcategorized data, which areobservations whichidentify events (event classes withinevent types) in aqualitative, non-numerica3 non-orderedmanner. For instance, the suits andcolors of playingcards are event types which arebased on categorizeddata. Three kinds ofprobRoilities aredistinguished in this note: a) unconditional(or marginal) probabilities

b) conditionalprobabilities c) joint probabilities.

We will be assumingthat we areobserving elements and at each ob- of a well-definedbasic population The unconditional servationtwo or more eventsoccur. probability of an eventis the probabilityof the occurrence ofthat event withoutregard to the occurrence of anyother event. It is the fraction in which that eventhas of the observations taken on the occurred ifobservations have been entire basicpopulation. The idea of jointprobability is the same as unconditional probabilityexcept that we are con- (or more) eventsoccurring in a cerned with two The joint single observationinstead of just one. probability of twoevents is thefraction of the observations inwhich both eventswould occur were population observed. The ampersand the wholewhole basic in this will be used toindicate joint occurrences of events . note; "A&B" meansthe joint occurrence A and B andPr(A&B)" means thejoint probability of their occurrence. The idea of conditional probability also involves at least two events, say A and B. The conditional probability that A occurs given the condition that B also occurs means that we are evaluating a probability within a population restricted by some condition which was not part of the original definition of the basic population. The occurrence of event A given that event B also occurs is Symbolized by "ALB" and "Pr(AIB)" means the conditional probability of event A given event B. The conditional probability of A given B is often defined as the ratio of their joint probabil- ity to the unconditional probability of B: Pr A&B Pr(AIB)

This means: of all the times that B occurs, Pr(AIB) is that fraction of the times wherein A also occurs.

When we speak of the occurrence of. event A,we imply the idea of the nonoccurrence of A. In other words, in the back of our minds we are considering an A-type event which includes two event classes: class A and class non-A. Not-A will be symbolized by "X" in this note.This A-type event is also known as a "binary variable" or a variable which can take on two values, A and X04 "Event type" is equivalent to the word "variable" here, and the language of this note could have been based on "qualitative variables" instead of "categorized event types."

mmlilWOMMselyidMMOCIM01111.01DOMMIIMMOMMONDOMINIMMOMMIMMiUMMINNIMMIONIMOMMIMONIMOmmon *Other words which are used for this kind of variable are: counting, indicator, dichotomous, two-state, two-point, and zero-one variable or distribution. III. StatisticalIndependence The standarddefinition of thestatistical two events,A and B, isthat the independence of is the probability oftheir joint occurrence product of theirprobabilities: Pr(A&B) = Pr(A) xPr(B) definition ofinde- Using thisjoint probability pendence and thedefinition ofconditional proba- the definitionof independence bility we obtain A and in terms ofconditionalprobability: events B areindependent if andonly if Pr .A Pr B Pr(ALB) = =Pr(A).

2 can be used The Venn diagramwhich is Figure (probability domain)relationship of to show the domains the two events,A and B.We see that the events mustoverlap a certainamount in of the two required for order to beindependent. The amount is the productof the probabil- their independence Pr(A)=.6 ities of theevents involved;e.g., if for A and B tobe and Pr(B)=.8then in order intersection oftheir probability independent the times .8 domains mustbe Pr(A) xPr(B), i.e., .6 or .48.

Pr(A) a.6, ,,Pr(B) =.8, d I

f

area externa to both Aand B Pr(A&B), .48 for inde- is X&B pendence Figure 2 the product If the amountof overlapis less than independence, theoccurrenceof either required for of event has atendency topreclude the occurrence probability domainsdo not the other. If the two each eventcompletelyprecludes the overlap, then mutually occurrence,Of theother and they are On the otherhand, if theamount of exclusive. required for overlap is greaterthan the amount independence thenthe occurrenceof either event probability ofthe occurrenceof the enhances the with one other. If there iscomplete overlap covering thatof the other,then, probability domain is completely of course,the occurrenceof the second dependent on theoccurrenceof the first. into tabular To put theVenn diagramof Figure 2 2 jointprobability tablewith marginal form, a 2 by drawn (Figure3). (unconditional)probabilities is of the table a,b, c, and d arejoint In the body probabilities probabilities andthe sums ofthe joint probabilities. in rows andcolumns aremarginal

Pr(A) A a b .644" Marginal probabilities X c d .44- Pr(X)

Pr(1!) PL(B )

Figure 3

domains (a,b, co d) ofthe Venn The probability tabulated form, correspond tothose in the diagram respectively, thejoint Figure 3. These are, Pr(A&B). Pr(A&B),Pr(A&B), Pr(X&B), probabilities: external to theA&B Note that aread is the area marginalprobabilities in domains. Given two marginals and oneof Figure 3(one of theA type A

-8

the B type) and anentry in anyof the four cells of the body ofthe table (a, b, c, ord), the remaining cells areeasily deduced. For instance, a+cm.8, if a is .5 thensince a+b=.6,1)13..1 and since cs.3 and finally,since c+d=.4, data. Note also that if a is .5,A and B are notindependent; they enhance each other'soccurrence. The tabularrepresentation has theadvantages of explicitlyidentifying I and8 and showing A and K in a uniform manner. The tabular formalso lends itself moreeasily to eventtypes which are classified into several(not just two) classes. Suppose that the typeA event has threeclasses Al, Aand Awhere Aare eventswhich belong to 2 o Similarly, supposethe type neither Al nor A2 B1, B2 and Bo. B events arecategorized into classes The Venn diagramof this with someprobabilities put in as anexample is Figure4.

A&B =.3 o o Figure 4

in The same jointprobabilities information tabular form appearsin Figure 5. 82 Bo A marginals

A .03 .06 :11 1 .2 1

A .02 .09 .19 .3 2

A .05 .15 .3 .5 o

B .6 marginals .1 .3

Figure5

Consider the independence-dependencecharacter- istics of the nine joint eventswhose probabilities are in the body ofthe table. We make the following four observations: 1) Ao is independent of all theBi 2) B2 is independent of all theAi

3) Ai enhances the probabilityof B1 while diminishing that of Ro

4) A2 enhances the probabilityof Bo while diminishing that of Bl.

This example shows thatinstances of dependence can be scatteredabout among variousjoint events when the event types eachinclude several classes.

In this examplewith three classes in eachof two event types, there are(3-1)(3-1)=4 degrees of freedom in thedetermination of the probabilities. In other words,given the marginalprobabilities, knowledge of certainsets of four of the nine joint probabilitiesin the body of the table -10-

determines all of them. These sets of probabilities are sets of four such that(after entering the set) at least one cell in each pow and columnremains empty and none of the remainingprobabilities in the body of the table can be determinedimmediately from both its row and its column.Another way of stating the second requirement is thatif an entry can be determined from theother entries in its raw, it must not bepossible to determine it from the other entries in its column. This can be generalized to m event types with nl, n2. nm classes, respectively. Then, given the marginal probabilities, there are(n1-1)(n2-1) (nm-1) degrees of freedom in determiningthe joint pro- babilities. Testing for independence in acontingency table is a classical statisticalproblem discussed in many statisticalbooks (see, for instance, Mood(2), pp.273-81). A tabulates aset of observations according to two eventtypes (criteria). Consider, for example, theclassification of a group of people according to the twoevent types of vision and weight. The vision of each person in the group belongs to oneof the three classes: near-sighted, normal sighted, andfar-sighted; and the weight of each belongsto one of the three classes: underweight,normal weight, and overweight. The contingency table forthese two event types would be a 3 x 3 tableindicating the number of observations in each of thenine possible combina- tions of vision and weight. Testing the independenceof the two event types in a contingencytable can be done in thefollowing three steps. 1) Change the number ineach of the cells to the fraction it is of thetotal number of obser- vations, i.e., divideeach cell number by the total numEer. 2) Enter the row and column sums of these fractions as marginals. (TheOhcell now is an estimate of the probability that the i--th row event and the j--th' column event occur togehter(jointly). If the two events are independent, anotherestimate of their joint probability is the product of the i--th row marginal and the j--th columnmarginal.)

3) Let 044bethe observed fraction of the total 'au th be the expected in the ij-- cell and Bij fraction assuming independence, i.e., the th product of the i--row andjthcolumn marginals; then compute the test for the (n x m) contingency table,

JaiE jal

Under the assumption of independence this statistic is approximately chi - square dis- tributed with (m-1)(n-1) degrees of freedom. The approximation to chi-square improves with larger numbers of observations in the cells. - 12 -

IV. Measures of Association The test for independence of twotypes of events can be a basis fordeciding whether or not the two event types are independent;if the test leads one to reject the independencehypothesis, one may want a more completeexplanation of the non-independence. One way to investigatenon-independence is by employing a "measure of association"which, in the statistical literatures has meant a measureof the direction and amount of departurefrom independence of a given cross-classification. Goodman and Kruskal have written threeextensive (32, 40, and 54 pages) papers on measures ofassociation (3), (4)9 (5). They favor a measure originallyproposed by Guttman in (1), but they feel thatthe uses of measures of association arevaried enough so that there should be several fromwhich to choose. None of the studies seen bythe author, however, considers analysis of departuresfrom independence in detail within across-classification (i.e., in the body of a jointprobability table). This is to say that the measuresof association thus far developed are meant to show(dependence) relation- ships between event types andnot those between the event classes of one event typeand those of another event type. If we were working withnumerical variables instead of categorized eventtypes and we assumed that the variables werelinearly related, we would probably consider the correlationcoefficient as the first candidate formeasuring their association. A set of three characteristicsof the correlation coefficient traditionally hasbeen sought in the appraisal of measures ofassociation for categorized data: 1) the measure is zero when theevent types are independent; 2) the measure is minus onewhen they have the maximumdisassociation; 3) the measure is plus one whenthey have the maximum association. -13 -

measure ofassociation which,in We will consider a be appliedonly to its immediateapplication, will classificationalthough in its a 2 x 2 cross appear in theassessment ultimateapplication it will crossclassifications. In addi- of general n x m shall tion to theabove set ofcharacteristics we to have anotherset of character- require our measure This to thecorrelationcoefficient. istics related measure set has todo with thesymmetry of the (classes) and theircomplements. between events of Specifically, if Zis the valueof a measure between eventsA and B, thenit the association Also, the the valuebetween K andg. must also be E and between value of the measurebetween A and K and B must be-Z. -14 -

V. TheZ-measure ofAssociation be shownthat develops a A method will now the requirements measure ofassociationconforming to and appliesto individualcells, specified above another. Since allowing them tobe comparedwith one (n x m) jointprobability each cell inthe given treat each treatedindividually, we table is to be entry in a 2by 2 cell as the upperleft corner table as inFigure 6. This is doneby letting B m a = thecell beingstudied A a b y b = sum ofthe remainderof the rowentries X c d 1-y

c = sumof theremainder of the columnentries and

d = sum ofthe remainderof the table(= 1-a-b-c); Figure 6 both the n x m marginalprobabilities in x and yare a+c and y =a+b. All and the 2 x2 tables; x n about the 2 x2 tableis contained the information and y. in the threequantities, a, x, between twobinary A measure ofassociation The theircorrelationcoefficient. variables is coefficient fortwo definition ofthe correlation variables A andB is:

..M.612.=1162E1112.. v/v(A)v(B) value of Aand V(A) is where E(A) isthe expected 2 the varianceof A whichisE(A) E(A) with binaryvariables, We areconcerned here A and B, a the jointdistribution of specifically distribution. In this bivariatebinaryprobability the A-or-not- define thedistribution of case we can integers zeroand one: A type ofevent on the -15-

Pr(A-or-not-A event type is 0)=Pr(A)=1-Pr(A) and Pr(A-or-not-A event type n 1) = Pr(A). A similar definition of the binary distribution of B applies. Then the follosidng equations hold:

E(AB)=Pr(A&B) = a

E(A) =E(A2)= Pr(A) = y

E(B) =E(B2)=Pr(B)= x

V(A) =y(1-y)

V(B) =x(1-x). This leads to a formula for the correlation coefficient associated with the 2 x 2 table in Figure 6,

a=asz Phi(a,x,y) a y(1-y)x(1-x) "Phi coefficient" is the usual name for thisstatis- tic, especially in psychologicalstatistics.* It is zero when the events areindependent (a = xy). It takes on the value plus one only when ais at its maximum and when x = y. The maximum of a is the lesser of x and y. The phi coefficient takes on the value minus one only when a is at itsminimum and when x + y = 1. The minimum of a is zero when x+ y 4 1 and x+ y-1 when x+ y 1. We want a measure of association whichalways takes on the value minus one when theleast possible association prevails and always takes onthe value plus one when the greatest possibleassociation prevails. Such a measure is obtained by dividing the phi coefficient by its maximumpossible value when positive association prevails and byits mini- mum possible valuewhen negative association pre- vails. We now define such a Z measure ofassociation OSIMMONOMOMPNOIONOWNIMMONWIANO MODONDOM01041110010 * For a discussion of thephi-coefficient see Guilford (7), pp. 333-36. - 16 -

for four possible situations which coverall the possibilities of a 2 x 2 table.

Situation I. a >xy, x>,y, thebinary variables are positively associated. Z is the ratio of phi coefficient to its maximum: a- Z(a,x,y) = jx(1-x)y(1-y) ix(1-x)Y(1-3)

y(1-x) y-xy

Situation II. a

-xy Z(a,x,y)=(-1) 11-*xY lx(1-x)y(1-y) vt;i1-x)Y(1-;)

xy

Situation III. a

z(aoca) = (-1)-. .312:1=3.Y. lx(1-203,(1.7) ix(1-x)y(i-y)

ava. 1-x-y+xy (1-x)(1-y)

Situation IV. a = xy, the binaryvariables are in- dependent; Z = O. -17-

In situations II and III, the sign is made negative to indicate disassociation. Horst (8), pp. 238-39, points out that this Z-measure is the ratio of two covariances, an observed covariance over a maximum (Sit. I) or a minimum (Sits. II and III) covariance. He uses it as a measure of the homogeneity of testitems. This is an interesting case in which the event classes are not mutually exclusive. The 2 x 2 table which is Figure 6 can be written solely in terms of a, x, and y. Such a table equivalence is shown in Figure 7.

B 2

A a y-a a b

a 1+a 1-y C d 1-y x-a -x-y

x 1-x x1-x

Figure 7 Assuming a >xy and m, we now find the Z-measure for each of the four cells in the body of the table. An arrow (--5) is used to implication.

Z(a,x,y) =a-3Z.. YT -x a > xy 1+a-x-y > 1+xy-x-y = (1-x)(1-y). Thus since 1+a-x-y > (1-x)(1-y), the computation for the lower right cell, d, is done using the Situation I formula. Z(d, 1-x, 1-y) = Z(l+a-x-y4 1-x, 1-y)

lismsthilillikal 0 tacz. (1.x) [1-(1-y] -18 -

For the upper rightcell, b, a > xy y < 1-7 (1-x) + y 1 Using Situation III Z(b, 1-x, y) =Z(y -a, 1-x, y)

= affix.). at rzaiihick

For the lower leftcell, c,

a > xy < x(1-3) x y <1-1 (1-y) + x>10 Using SituationIII, Z(c, x9 1-y) = Z(x -a, x,1-y)

-YTI=i) (1-x) 1-(1-y)

Thus Z(a, x, y) =Z(d, 1-x, 1-y) =

0,0Z(b, 1-x, y) -Z (c, x, 1-y)

or in termsof the binaryvariables A and j, Z(A,B) s Z(A, g) =-Z(A, E) = -Z(X,B)

Thus we have a measureof associatioafor a pair of binary variableswhich has the desiredcharacteristics: 1. zero whenvariables are indepildat (Situation IV) minus one whenvariables are ascompletely disassociated aspossible (in SituationsII and. III)

3, plus onewhen the variables are ascompletely associated as possible(in sitution I)

4. Z(A,B) =gi,g) m-Z(X,B) =-Z(AAJ) .19

Z(a,x,y) is a function which maps points in a three-dimensional space to points in a one- dimensional iqlace. Figure 8 shows one aspect of this mapping: Z as a function of a with x and y fixed.

Z and 0

0(a) domain extends down to zero in Sit. II; but only to x+y-1 in Sit. III

lx+y-11

0

Cga/xy

a is a joint probability xy x and y are its marginals ZtaAy Ay Ay

1,46 sit.ii

Figure 8

The bivariate binary correlation coefficient, Ca) or phi(a); is plotted as a dotted line. It has the same domain (set of arguments) as Z but is linear throughout its domain. -20-

One should keep in mind that, althoughthe Z-measure is related to the correlationcoefficient it differs fr the correlation coefficient in fundamental ways. The Z-measure is not linear at Z=C except when one or both of the marginalsis .5. It is a two-part linear functionof the correlation coefficient such that the coefficients of the function are non-linear functions of x and y. The Z-measure should be judged, however, inits own right, and not as a replacement forthe correlation coefficient which fails to behave for binriey variables as it does for linearly relatedcontinuous numerical variables. It is hard to find a common situation with less favorable conditionsfor establishing linear relations thanthat of the bivariate binary distribution withpoint-masses of probabilities at the four corners of a square. The perfect correlation condition there is whenall the mass is on diagonallyopposite corners. As we saw earlier, this can occur only when x = y(for perfect correlation) or when x + y = 1 (for perfect negative correlation). In all other cases the correlation coefficient cannot attain 1. The advantage of Z over thecorrelation coefficient is that it has the ofvalues minus one to plus one for all x and y combinations. This allows' one to compare Z measuresbased on different x and y combinations and say that the amountof between two event classes isgreater or less than that between another pair. Additional insight into the natureof the Z- measure is gainedwhen considering it from theset- theoretic point of view using theVenn diagram approach of Section III. Z-measure essentially shows the relationship betweenthe amount of ob- erved overlap and the maximum orminimum potential o4erlap of two probabilitydomains. It may be thcught of as the attraction orrepulsion force existing between theprobability domains of two events. In this sense it isconsidered to be xeasuring agymmetricforce such as gravity or magnetic forces. - 21 -

There have been several discussions of the Z-measure in psycholo6;ical statistics. The Z- measure usual.;; _led 40 / 0 max" with little mention of "0 / 0 min". Carroll (9).pp. 363-64 discusses this measure in terms of 2x 2 tables derived from the division of continuous bivariate distributions into four parts; observationsare classified as being above or below a pointon the scale of one marginal distribution and aboveor below a point on the scale of the other marginal distribution. Carroll particularly refers to the "tetrachoric correlation coefficient" (which assumes a bivariate normal underlying distribution) as the ideal one and finds the Z-measure does not approximate the tetrachoric correlation coefficient very well. He rather emphatically dismisses the Z-measure on this basis. Guilford (7),PP. 337-38, also warns against the use of the 1-measureas an "indicator of intrinsic correlation." In general, the psychometricians discount theuse of the Z- measure in lieu of a (Pearson) correlation co- efficient. The mathematical treatment of the Z- measure, however, needs more development than is given in the psychometric literature in order to understand enough about the measure to make proper use of it Cureton (10), p. 89, for instance, states that Z can be +1 only when x= y = .5 and Z can be -1 only when x= 1-y = .5; whereas Z = +1 if and only if x= y Y a and Z = -1 if and only if x = 1-y and a = Oc

Comparing the joint probability (a) with the Z-measure for a given joint event, and generalizing over the four situations we see that a = xy + cZ where

c f:lin (xty) xy if Z ?0

xy if Z 4 0 and x+y 4 1

=(I-x)(1-y) if.Z 4 0 and x+y b 1. Z is thus seen to show the relationship between the joint probability and the associated marginal -.22-

probabilities, x and y. The nature of c is further elaborated in Section VII. There are also two conditional probabilities associated with a joint event and theseshow, respectively, the relationship between onemarginal and the joint probability and theother marginal and the joint probability. Conditional probabilities are used to improveone's ability to predict the occurrence or non-occurrenceof an event by obtaining information about the occurrence or non-occurrence of another event* Such a prediction operation so"ietimes tends to lead to the imputationof causality between the two events. The Z-measure is symmetric in the marginals and thus avoids such animputation. A.conditional probability without the appropriate unconditional probability does not tell onewhether the event being conditioned is more orless likely when it occur: jointly with theconditioning event. Thus the conditional probability by itselfis not a measure of associationwhereas the Z-measure is. -23-

VI. An.1221ication ofthe Z-measure A contingency tablewhich has often appeared as an example intextbooks and articlesinvolves the cross-classification of 6800males according to their hair and eyecolor.* Four classes ofhair color were observed:fair, btown, black, andred. Three classes of eyecolor were observed:blue, hazel or green, andbrown. Figure 9 shows the original observations(contingency table); Figure 10 shows thecorresponding jointprobabilities table; Figure 11 showsthe correspondingvalues of the Z-measure ofassociation; Figure 12 shows conditional probabilitiesof hair color given eye color and eye colorgiven hair color. This cross-classificationfails the classical chi-square test ofindependence at the .000001 level. Some of the detailsbrought out by the Z-measure are that redhair is essentiallyindepen- dent of eye color forthis population while a general correlationof eye and hairpigment holds. The disassociationof fair hair andbrown eyes (-.678) and of black hairand blue eyes(-.626) are, however, much more pronouncedthan the associationsof fair hair and blue eyes(4..365) and of black hairand brown eyes (+.190). The weakness ofthe black hair and brown eye associationis surprisinglyless than that of both blackhair and hazel eyes(.277) and that of brown hairand brown eyes(.201).

ommwmilwasommwommoNlimMOOMMMOGOOMPMi0110ammommommems. * E.g., seeGoodman and Kruskal(3). -24 -

HAIR COLOR fair brown black red

blue 1768. 807 189 47 2811 EYE hazel 946 1387 746 53 3132 COLOR brown 115 438 288 16 857

2829 2632 1223 116 FIGURE 9

JOINT PROBABILITY TABLE fair brown black red

blue .26 .119 .028 .0069 .4134

hazel .139 .204 .110 .0078 .4606

brown .017 .064 .042 .0024 .1260

.416 .387 .1799 .0171 Figure 10 Z MEASURE OF ASSOCIATION fair brown black red blue .365 -.258 -.626 -.02 hazel -.274 .084 .277 -.008 brown -.678 .201 .190 .014 Figure 11

CONDITIONAL PROBABILITY TABLES EYE GIVEN HAIR HAIR GIVEN EYE fair brown black red fair brown black red

blue .625 .326 .154 .405 .629 .287 .0672 .0167 1.0 hazel .334 .525 .610 .457 .302 .443 .238 .0169 1.0 brown .041 .166' .235 .138 .134 .511 .336 .0187 1.0

1.0 1.0 1.0 1.0

Figure 12 -25-

VILThe Reverse Inference: Joint ProbabilitiesFrom Z.-measures Heretofore we have been consideringthe derivation of the Z-measure from a jointprobability table. We now consider thederivation of a joint probability table given a set of Z-measuresand marginal proba- bilities. We may be given the marginaldistributions of a cross-classification, forinstance, and (perhaps subjective) estimates of some of the associations between various states. We can specify no more Z-measuresthan the degrees of freedom involved. The joint probability table and the Z-measure table of a three by threecross-classifica- tion are shown in Figure 13.

all a12al3Y1 zll Z12 Z13

a21a22a23y2 z21 a22a23 a32a3373 z32z33 a31 a31

X x2 x3 1 Figure 13

We derive definitions of the aidfrom the specifi- cations for Z's on page 16.

aij = xjyi + ciejj

where cif = min(xjai) xjyi if Z 0

xjyi if Z 0 and xj+71.41

=(1-xj)(1-7i) if Z 41 0 and xeyel -26-

.25 Theorem: cij f2. One cij iscomposed of twofactors, sayf. at; of the factors, sayf1, is no great k;al' Suppose (atworst) fl O. Then f2 4 .5+u. 2 (.5-u) (.5+u) =.25-u 4 .250 f2=.5+u. Then fif2 = between c is the maximumpossible difference ij whether a a and x given theinformation about ij ij jyi Zij indicates how is larger orsmaller than xjyi; is attained by air much of thismaximum deviation then for each row One possibilityis that aij = xjyi; However, the rowand and each columnrjaij=Exjyi. probabilities column sums arethe same(i.e., the marginal Thus for arefixed)forallpermissiblesetsofa.j. for each row any permissibleconfiguration of the a and column, E ajj =Exjyi and alj (xjyecijZi3). Therefore, for each rowand column ci Zi,=0. For

convenience, let = cZ We now have the dii ijij following set of sixequations definingthe relations

among the dijfor a 3 x 3 Jointprobability table: -27

1) dil d12 (113 o

2) d21 d22 d23 o row sums

3) d + d + d = 0 31 32 33

4) dil+ d21 + d31 = 0

5)d12 d22 d32 ° column sums d) d13 + d23 + d33= 0

0

Earlier the number of degrees of freedomassociated with a cross-classification table were discussed If we decide certain sets of four d (pp.9,10). ij' the remainder of a 3 x 3 table of dij's are determined. There are 126 ways of selecting four cells fromamong nine; only 81 of these ways conform to the degrees of freedom requirements stated earlier. Each of those 81 provides us with a different configuration of four cell choices in a 3 x 3 table. If we specify one of these configurations, the for dii the remaining five cells can be stated in terms of our four initial dii For instance, suppose we are given d11, d22, d33 and d31(see Figure 14). Using the row and column sum equations we obtain the re- maining dij in terms of these four: d = -d31 - d 32 33 d = -d d 21 11 31 d = -d + d + d 12 22 31 33 d = -d + d - d d 13 11 22 31 33 d = d d + d 23 11 22 31 Figure 14 a -28-

Cross- MarkovProcesses as VIII TheInterretation of Criss ca ions process) isgenerallydefined A Markovchain (or its transitionmatrix and by itsprobability Figure 15is anexample of initialdistribution. chain; it matrix of a4-state Markov atransition of event probability ofthe occurrence shows the Ei occurredat time Ej at timek+1 giventhat event column indicesrespectively). k (i andj are therow and Time k+1 E E E E1 2 3 4 0 El .6 .4 0 .4 0 Time E2 0 .6 .4 E 0 0 .6 k 3 .2 0 0 .8 E4

Figure 15 matrix is atable simile:,to This transition It is, cross-classification. thatassociatedwith a conditionalprobabilities however, atable of specification of across- whereas forthe complete needed. jointprobabilitytable is classification a table canbe deducedfrom A conditionalprobability table but ajoint jointprobability anunderlying be deducedfrom either or probabilitytable cannot probability correspondingconditional both of its table can,however, be tables. A jointprobability tables and one deduced from oneof itsconditional distributions. In a of itsmarginalprobability distribution chain, aninitialprobability Markov is given;this PDVis one vector(hereafter PDV) sets inthe under- marginalprobability of the two table. The initialPDV lying jointprobability needed for across- completes thespecifications chain. interpramtionof a Markov classification PDV's withtheir assoc- oftensubscrift the We will the initialPDV will bePDVk. iated timeindex, e.g., -29-

Continuing with the example in Figure 15, suppose-we are given a PDV of (.1, .2, .3,.4) for events El to E4. Then the joint probability table becomes that shown in Figure 16.

E1 E E 2 3 4 E .06 .04 0 0 .1 1 E 0 .12 .08 0 Time 2 0 0 .18 .12 .3 k

.0 .32 .4 E4 .08 0

.14 .16 .26 .44 Time k+1

Figure 16 .. The joint probabilities in the body of the table are obtained from the conditional probabilities in Figure 15 by multiplying the probabilities in each row by the corresponding probabilities of PDVk. The event types are events observed at the two times, k and k+1 while the event classes are El to E4for each type. The present discussion tends to be in terms of the transformation of PDV's in contrast to the usual emphasis on probabilities of going from one state to another in a certain number of transitions (e.g., from E to E during time k to k+m). Thus the 1 4 usual approach is based on powers of the transition matrix whereas this is based on sequences of PDV's. A large portion of the problems couched in terms of Markov chain theory are problems which assume that the process is ergodic.A Markov process is -30-

ergodic* if it converges to asteady state as k gets larger. A Markov process isin a steady state For practical at time k only if PDVk =PDVk+l purposes we could saythat a process is inthe neighborhood of a steady stateat time k+1 if PDVk by less than a specifiedsmall differs from PDVk+1 amount. This neighborhood mightbe defined in terms of the accuracy of theestimates of the probabilities in the PDV's. The bottom marginal(or PDV) in Figure 16 could be obtained by multiplyingtogether the right marginal probabilities andthe transition matrix of Figure 15 thus: (a, .2, .3, .4) .6 .4 0 0 0 .6 .4 0 =(.14,.16,.26,.44) 0 0 .6 .4

.2 0 0 .8

Such a multiplicationindicates the transformation of the PDV between timek and k+1. We can obtain by multiplying the resultby the same PDVk+2 transition matrix. In repeated multiplicationof the previous result,the difference betweensuccessive PDV's is seen to diminish;the process is approaching the steady state. The process is in asteady state if the input PDV isthe same as the outputPDV, i.e. 0 0 (x1, x2, x3, x4) .6 .4 0 .6 .4 0 = (x1, x2, x3,x4) 0 0 .6 .4 .2 0 0 .8 IniTareTrannirErrilTirTIT777735377--- different definitionis given by Kemeny &Snell (12), p. 99;they include periodic orcyclic Markov chains in ergodic chains,Feller does .not. 31

Figure 17 is the jointprobability table ofthe steady state of the process,the transitionmatrix of which is shownin Figure 15.

E1 E2 E3 E4

El .12 .08 0 0 .2 0 .12 .08 0 .2 E2 E 0 0 .12 .08 .2 3 .08 ) 0 .32 .4

.2 .2 .2 .4

Figure 17

Figure 18 is thecorresponding Z-measuretable.

.5 .25 -1 -1

-1 .5 .25 -1

-1 .1 .5 0 .666 0 1 -1 -1

Figure 18

As an ergodicMarkov process progressestoward the steady state, the transitionmatrix remains generally change as invariant. The following items the time index(k) changes: PDV's) 1. Both setsof marginalprobabilities (the 2. The jointprobability table

3. The Z-measure -32-

4, The conditional probability table whichreverses the effect of the transition matrix, i.e., that which would take PDVk +lto PDVk. The table in (4.) is that obtained by dividing the joint probabilities associated with the k to k+1 transition by the marginal probabilities in the margin below. The transformation from PDVk to PDV k could also be accomplished with the inverse of the transition matrix if the transition matrix isnon- singular; such an inverse is invariant with respect to k.

A special kind of Markov chain which is an exception to the above characterization is that in which the transition to each state is independent of the state from which the transition began.Then the following conditions prevail:

1. All rows of the transition matrix are the same 2. The joint probabilities are products of their marginals 3. All Z-measures are zero

4. Given any initial PDV the convergence to the steady state is immediate and complete at time k+1 and all further transitions do not change the PDV's. This Special kind of Markov process suggests a potential basis for the comparison or character- ization of Markov processes: the rapidity of convergence to a neighborhood of the steady state. Apparently the farther the Z-measures of the joint probability table of the steady state are from zero, the slower is the convergence. Notice, however, that zeroes and ones in the transition matrix indicate cells whose Z-measures are invariant during convergence so that such cells should be treated differently (perhaps excluded) from the cells which change during convergence. The theory needs further development in this area. -33-

The steady state jointprobability table and Z-measures are fixed for agiven transitionmatrix and are independent ofthe initial PDV. Thus they are obviouschoices for characterizing aMarkov process. An infinite numberof transition matrices converge to eachsteady state PDV; theyall have different joint probabilitytables and different Z-measures. If one is altering aMarkov process in orderto have it converge to atarget steady state onemethod would be the comparisonof the Z-measurematrix of the process atpresent with that ofthe target steady state. This will show whichtransitions must become more likely andwhich ones mist becomeless likely in order toattain the target. It may well be, however, thatthe transitionmatrix which maintains the steadystate PDV does notmatter, i.e., only the PDVitself matters. Then many alternative Z-measure configurationscould be compared to see whichwould be the preferredtarget based perhaps on time,cost, and othercriteria.

The Markov processdefined in Figures15 to 18 has a 4 x 4 jointprobability table co ithas 3 x 3 = 9 degreesof freedom. There are eight zeroes in thejoint table whichremain there throughout alltransitions; these leadto corres- ponding fixedZ-measures of minus one. By choosing any one ofthe non-negativeZ's one can determine the entire process.This is a special(flow process) case andillustrates the methodof reverse inference discussedin the previoussection .

- 34 -

REFERENCES

1. Guttman, Louis. "An Outline of the of Prediction," Supplementary Study B-1 (pp. 253-311) in Paul Horst and others, The Prediction of Personal Adaustment. New YBFE: Mail-Gdance TiiiiilErCoURETT7-1941. Bulletin 4B. 2. Mood, A.M. Introduction to the Theoryflatistics. New York: METEN'AIII, T9507

3. Goodman, L.A. and W.H. Kruskal. "Measures of Association for Cross Classifications," Journal of American Statistical_ Association: 49 T155477

4. Ibid. Vol. 54 (1959), 123-63. 5. Ibid. Vol. 58 (1963), 310-64. 6. Kendall, M.G. and A. Stuart.The Advanced Them of Statistics. Vol. II. New YoFET-REUBEFRIII,

7. Guilford, J.P. FundamentalAtatistics in Ps cho- lo and Educat i3E74TE-ET.-Niii-Y5FE: c raw ffl1,

8. Horst, Paul. Egychological Measurement and Pre: diction. Belmont-7"MM: gaswortFPUBTisEag 6771566. 9. Carroll, John B. "The Nature of the Data, or How to Choose a Correlation Coefficient," PsXchometrika: 26 (Dec. 1961), pp. 347-72. 10. Cureton, Edward E. "Note on 0 / Omax," psycho- metrika:24 (March 1959), pp. 89-91. 11. Feller, William. An Introduction to Probability TheoryancLIAIRELTEEITURK-NWTORT73W1griey,

12. Kemeny, J.G. and J.L. Snell. Finite Markov Chains. Princeton: Van Nostrand, 1960.