On Departures from Independence in Cross-Classifications

REPORT RESUMES ED 016229 EA 001 061 ON DEPARTURES FROM INDEPENDENCE IN CROSS-CLASSIFICATIONS. BY... CASE, C. MARSTON NATIONAL CENTER FOR EDUCATIONAL STATISTICS (DHEW) REPORT NUMBER T*-10 PUB DATE 18 NOV 66 EDRS PRICE MF...$0.25 HC -$1.52 36P. DESCRIPTORS-*CLASSIFICATION, *STATISTICAL DATA, *PROBABILITY, *STATISTICALANALYSIS, *STATISTICS, MODELS, CHARTS, DISTRICT OF COLUMBIA,ZMEASURE, THIS NOTE IS CONCERNED WITH IDEAS AND PROBLEMSINVOLVED IN CROSS- CLASSIFICATION OF OBSERVATIONSON A GIVEN POPULATION, ESPECIALLY TWO-DIMENSIONAL CROSS-CLASSIFICATIONS. MAIN OBJECTIVES OF THE NOTE INCLUDE --(1)ESTABLISHMENT OF A CONCEPTUAL FRAMEWORK FOR CHARACTERIZATION AND COMPARISONOF CROSS-CLASSIFICATIONS, (2) DISCUSSION OF EXISTING METHODSFOR CHARACTERIZING CROSS - CLASSIFICATIONS, (3) PROPOSALOF A NEW APPROACH TO AND A NEW METHOD FOR CHARACTERIZINGAND MAKING INFERENCES FROM CROSS-CLASSIFICATIONS, AND (4)INDICATION OF HOW MARKOV PROCESSES CAN BE TREATED AS CROSS-CLASSIFICATIONS. THREE KINDS OF PROBABILITIES (UNCONDITIONAL,CONDITIONAL, AND JOINT) ARE DISCUSSED IN TERMS OF STATISTICALINDEPENDENCE. MEASURES OF ASSOCIATION, ESPECIALLY THE Z-MEASUREOF ASSOCIATION AND ITS RELATION TO THE PHI COEFFICIENT IN VARIOUS SITUATIONS, ARE DISCUSSED AS MEANS OFINVESTIGATING NON-INDEPENDENCE. AN EXAMPLE OF THE Z-MEASURE APPLICATION IS PRESENTED. THE PROCESS OF DERIVING JOINT PROBABILITIESFROM Z...MEASURES AND THE INTERPRETATION OF MARKOV PROCESSES AS CROSS-CLASSIFICATIONS ARE ILLUSTRATED. (HW) U.S. DEPARTMENT OF HEALTH, EDUCATION & WELFARE OFFICE OF EDUCATION 00 THIS DOCUMENT HAS BEEN REPRODUCED EXACTLY AS RECEIVED FROM THE (NJ PERSON OR ORGANIZATION ORIGINATING IT.POINTS OF VIEW OR OPINIONS STATED DO NOT NECESSARILY REPRESENT OFFICIAL OFFICE OF EDUCATION rI O POSITION OR POLICY. O Usal NATIONAL CENTER FOR EDUCATIONALSTATISTICS Division of Operations Analysis 6N DEPARTURES FROM INDEPENDENCE IN CROSS-CLASSIFICATIONS by C. Marston Case Technical Note Number 10 November18, 1966 OFFICEOF EDUCATION/U.S. DEPARTMENT OF HEALTH, EDUCATION,AND WELFARE NATIONAL CENTER FOR EDUCATIONAL STATISTICS Alexander M. Mood, Assistant Commissioner DIVISION OF OPERATIONS ANALYSIS David S. Stoller, Director t TABLE OF CONTENTS I. Introduction 2 II. Terminology 3 6 III. StatisticalIndependence IV. Measures ofAssociation 12 V. The Z-measure ofAssociation 14 VI. An Applicationof the Z-measure VII. The ReverseInference: Joint Probabilities fromZ-measures 25 VIII. The Interpretationof Markov Processes asCross-Classifications 28 . References 34. I. Introduction This note *ie concerned with ideas and problems involved in cross-classification of observations on a given population. Most of the note will be confined to the discussion of two-dimensional cross-classifications. An example of this is the two-dimensional cross-classification of a portion of a deck (population) of playing cards resulting from the classification of the cards according to suit and according to whether or not the card is a face card. Such a cross-classification would consist of a tabulation of the numbers in each of the (4 x 2 = 8) possible combinations of suit and face-or-no-face characteristics. The main objectives of this note are: 1) to establish c conceptual framework for characterization and comparison of cross-classifications; 2) to discuss existing methods for characterization of cross-classifications; 3) to propose a new approach and a new method for characterizing and making inferences from cross- classifications; 4) to indicate how Markov processes can be treated as cross-classifications. * The author wishes to thank Stephen Clark, George Mayeske, Richard O'Brien, and Frederic Weinfeld for suggestions lads during the writing of this note. 3 II. Terminolosz The word "event" is the conventional probabil- istic term used to indicate what isobserved; we will speak of observations on a given population with the assumption that one or moreevents are observed with each observation. The generic term "event type" will be used to specify thecharacter- istic being classified in one dimension of a cross- classification. When we observe two or more events in a single observation we areobserving the joint occurrence of the givenevents. The number of (joint) events so observed is the dimension ofthe cross-classification and is also the numberof event types in the cross-classification. The suit of a card classifies it according to oneevent type while the face-or-no-face characteristicclassifies it according to another eventtype. Within each event type there are two or more"event classes"; these classes are mutuallyexclusive and exhaustive, i.e., each observation belongs toexactly one event -Class within each event type. Within the suit event type the four eventclasses are club, diamond, heart, and spade. See figure 1. , Suit Event 1. types Event classesdiamondheart club spade / 1---" Face-or- face no-face no face Figure 1 What are called"event classes withinevent types" here are called"subcategories withinattributes" by Guttman(1), p. 258 and"classifications within criteria" by Mood(2), p. 274, and"classes within polytomies" by Goodmanand Kruskal(3), and "cate- (6), gories withinvariables" by Kendalland Stuart An extensivetreatment ofcross-classifica- p. 256. 55- tions is givenby Kendall andStuart (6) in a page chapterentitled "CategorizedData."The present paper dealsexclusively withcategorized data, which areobservations whichidentify events (event classes withinevent types) in aqualitative, non-numerica3 non-orderedmanner. For instance, the suits andcolors of playingcards are event types which arebased on categorizeddata. Three kinds ofprobRoilities aredistinguished in this note: a) unconditional(or marginal) probabilities b) conditionalprobabilities c) joint probabilities. We will be assumingthat we areobserving elements and at each ob- of a well-definedbasic population The unconditional servationtwo or more eventsoccur. probability of an eventis the probabilityof the occurrence ofthat event withoutregard to the occurrence of anyother event. It is the fraction in which that eventhas of the observations taken on the occurred ifobservations have been entire basicpopulation. The idea of jointprobability is the same as unconditional probabilityexcept that we are con- (or more) eventsoccurring in a cerned with two The joint single observationinstead of just one. probability of twoevents is thefraction of the observations inwhich both eventswould occur were population observed. The ampersand the wholewhole basic in this will be used toindicate joint occurrences of events . note; "A&B" meansthe joint occurrence A and B andPr(A&B)" means thejoint probability of their occurrence. The idea of conditional probability also involves at least two events, say A and B. The conditional probability that A occurs given the condition that B also occurs means that we are evaluating a probability within a population restricted by some condition which was not part of the original definition of the basic population. The occurrence of event A given that event B also occurs is Symbolized by "ALB" and "Pr(AIB)" means the conditional probability of event A given event B. The conditional probability of A given B is often defined as the ratio of their joint probability to the unconditional probability of B: Pr A&B Pr(AIB) This means: of all the times that B occurs, Pr(AIB) is that fraction of the times wherein A also occurs. When we speak of the occurrence of. event A,we imply the idea of the nonoccurrence of A. In other words, in the back of our minds we are considering an A-type event which includes two event classes: class A and class non-A. Not-A will be symbolized by "X" in this note.This A-type event is also known as a "binary variable" or a variable which can take on two values, A and X04 "Event type" is equivalent to the word "variable" here, and the language of this note could have been based on "qualitative variables" instead of "categorized event types." mmlilWOMMselyidMMOCIM01111.01DOMMIIMMOMMONDOMINIMMOMMIMMiUMMINNIMMIONIMOMMIMONIMOmmon *Other words which are used for this kind of variable are: counting, indicator, dichotomous, two-state, two-point, and zero-one variable or distribution. III. StatisticalIndependence The standarddefinition of thestatistical two events,A and B, isthat the independence of is the probability oftheir joint occurrence product of theirprobabilities: Pr(A&B) = Pr(A) xPr(B) definition ofinde- Using thisjoint probability pendence and thedefinition ofconditional proba- the definitionof independence bility we obtain A and in terms ofconditionalprobability: events B areindependent if andonly if Pr .A Pr B Pr(ALB) = =Pr(A). 2 can be used The Venn diagramwhich is Figure (probability domain)relationship of to show the domains the two events,A and B.We see that the events mustoverlap a certainamount in of the two required for order to beindependent. The amount is the productof the probabil- their independence Pr(A)=.6 ities of theevents involved;e.g., if for A and B tobe and Pr(B)=.8then in order intersection oftheir probability independent the times .8 domains mustbe Pr(A) xPr(B), i.e., .6 or .48. Pr(A) a.6, ,,Pr(B) =.8, d I f area externa to both Aand B Pr(A&B), .48 for inde- is X&B pendence Figure 2 the product If the amountof overlapis less than independence, theoccurrenceof either required for of event has atendency topreclude the occurrence probability domainsdo not the other. If the two each eventcompletelyprecludes the overlap, then mutually occurrence,Of theother and they are On the otherhand, if theamount

Load more