Factors Relevant to the Validity of Experiments in Social Settings1 Donald T

PSYCHOLOGICAL BULLETIN VOL. 54, No. 4, 1957 FACTORS RELEVANT TO THE VALIDITY OF EXPERIMENTS IN SOCIAL SETTINGS1 DONALD T. CAMPBELL Northwestern University What do we seek to control in ex- the process of analyzing three pre- perimental designs? What extraneous experimental designs. In the subse- variables which would otherwise con- quent evaluation of the applicability found our interpretation of the of three true experimental designs, experiment do we wish to rule out? factors leading to external invalidity The present paper attempts a specifi- will be introduced. The effects of cation of the major categories of such these extraneous variables will be extraneous variables and employs considered at two levels: as simple or these categories in evaluating the main effects, they occur independ- validity of standard designs for ex- ently of or in addition to the effects perimentation in the social sciences. of the experimental variable; as Validity will be evaluated in terms interactions, the effects appear in of two major criteria. First, and as a conjunction with the experimental basic minimum, is what can be called variable. The main effects typically internal validity: did in fact the ex- turn out to be relevant to internal perimental stimulus make some sig- validity, the interaction effects to ex- nificant difference in this specific internal validity or representativeness. stance? The second criterion is that The following designation for ex- of external validity, representativeness, perimental designs will be used: X or generalizability: to what popula- will represent the exposure of a group tions, settings, and variables can this to the experimental variable or event, effect be generalized? Both criteria the effects of which are to be meas- are obviously important although it ured; 0 will refer to the process of turns out that they are to some ex- observation or measurement, which tent incompatible, in that the con- can include watching what people do, trols required for internal validity listening, recording, interviewing, ad- often tend to jeopardize representa- ministering tests, counting lever de- tiveness. pressions, etc. The Xs and Os in a The extraneous variables affecting given row are applied to the same internal validity will be introduced in specific persons. The left to right di- 1 mension indicates temporal order. A dittoed version of this paper was pri- Parallel rows represent equivalent vately distributed in 1953 under the title "Designs for Social Science Experiments." samples of persons unless otherwise The author has had the opportunity to benefit specified. The designs will be num- from the careful reading and suggestions of bered and named for cross-reference L. S. Burwen, J. W. Cotton, C. P. Duncan, purposes. D. W. Fiske, C. I. Hovland, L. V. Jones, E. S. Marks, D. C. Pelz, and B. J. Underwood, THREE PRE-EXPERIMENTAL DESIGNS among others, and wishes to express his ap- preciation. They have not had the opportu- AND THEIR CONFOUNDED EXTRANE- nity of seeing the paper in its present form, OUS VARIABLES and bear no responsibility for it. The author also wishes to thank S. A. Stouffer (33) and The One-Shot Case Study. As B. J. Underwood (36) for their public en- Stouffer (32) has pointed out, much couragement. social science research still uses De- 297 298 DONALD T. CAMPBELL sign 1, in which a single individual or in which all extraneous stimuli are group is studied in detail only once, eliminated. The approximation of and in which the observations are such control in much physical and attributed to exposure to some prior biological research has permitted the situation. satisfactory employment of Design 2. X 0 1. One-Shot Case Study But in social psychology and the other social sciences, if history is con- This design does not merit the title of founded with X the results are gen- experiment, and is introduced only to erally uninterpretable. provide a reference point. The very The second class of variables con- minimum of useful scientific infor- founded with X in Design 2 is here mation involves at least one formal designated as maturation. This covers comparison and therefore at least two those effects which are systematic careful observations (2). with the passage of time, and not, The One-Group Pretest-Posttest De- like history, a function of the specific sign. This design does provide for one events involved. Thus between Oi formal comparison of two observa- and 02 the respondents may have tions, and is still widely used. grown older, hungrier, tireder, etc., 0i X 02 2. One-Group Pretest-Posttest and these may have produced the Design difference between 0\ and 02, inde- However, in it there are four or five pendently of X. While in the typical categories of extraneous variables left brief experiment in the psychology uncontrolled which thus become rival laboratory, maturation is unlikely to explanations of any difference be- be a source of change, it has been a tween 0i and 02, confounded with problem in research in child develop- the possible effect of X. ment and can be so in extended ex- The first of these is the main effect periments in social psychology and of history. During the time span education. In the form of "spontane- ous remission" and the general between 0i and 02 many events have occurred in addition to X, and the processes of healing it becomes an results might be attributed to these. important variable to control in Thus in Collier's (8) experiment, medical research, psychotherapy, and 2 social remediation. while his respondents were reading Nazi propaganda materials, France There is a third source of variance fell, and the obtained attitude that could explain the difference be- changes seemed more likely a result of tween Oi and 02 without a recourse to this event than of the propaganda.3 the effect of X. This is the effect of By history is meant the specific event testing itself. It is often true that series other than X, i.e., the extra- persons taking a test for the second experimental uncontrolled stimuli. time make scores systematically dif- Relevant to this variable is the con- ferent from those taking the test for cept of experimental isolation, the the first time. This is indeed the case employment of experimental settings for intelligence tests, where a second mean may be expected to run as s In line with the central focus on social much as five IQ points higher than psychology and the social sciences, the term the first one. This possibility makes respondent is employed in place of the terms subject, patient, or client. important a distinction between re- 3 Collier actually used a more adequate de- active measures and nonreactive meas- sign than this, an approximation to Design 4. ures. A reactive measure is one VALIDITY OF EXPERIMENTS IN SOCIAL SETTINGS 299 which modifies the phenomenon un- niques must be rated as reactive, as der study, which changes the very shown, for example, by Crespi's (9) thing that one is trying to measure. evidence. In general, any measurement proce- Even within the personality and dure which makes the subject self- attitude test domain, it may be conscious or aware of the fact of the found that tests differ in the degree to experiment can be suspected of being which they are reactive. For some a reactive measurement. Whenever purposes, tests involving voluntary the measurement process is not a part self-description may turn out to be of the normal environment it is prob- more reactive (especially at the inter- ably reactive. Whenever measure- action level to be discussed below) ment exercises the process under than are devices which focus the study, it is almost certainly reactive. respondent upon describing the ex- Measurement of a person's height is ternal world, or give him less latitude relatively nonreactive. However, in describing himself (e.g., 5). It measurement of weight, introduced seems likely that, apart from consid- into an experimental design involving erations of validity, the Rorschach adult American women, would turn test is less reactive than the TAT or out to be reactive in that the process MMPI. Where the reactive nature of of measuring would stimulate weight the testing process results from the reduction. A photograph of a crowd focusing of attention on the experi- taken in secret from a second story mental variable, it may be reduced window would be nonreactive, but a by imbedding the relevant content in news photograph of the same scene a comprehensive array of topics, as might very well be reactive, in that has regularly been done in Hovland's the presence of the photographer attitude change studies (14). It would modify the behavior of people seems likely that with attention to seeing themselves being photo- the problem, observational and meas- graphed. In a factory, production urement techniques can be devel- records introduced for the purpose of oped which are much less reactive an experiment would be reactive, but than those now in use. if such records were a regular part of Instrument decay provides a fourth the operating environment they uncontrolled source of variance which would be nonreactive. An English could produce an Oi-Os difference anthropologist may be nonreactive as that might be mistaken for the effect a participant-observer at an English of X. This variable can be exem- wedding, but might be a highly reac- plified by the fatiguing of a spring tive measuring instrument at a Dobu scales, or the condensation of water nuptials. Some measures are so ex- vapor in a cloud chamber. For psy- tremely reactive that their use in a chology and the social sciences it pretest-posttest design is not usually becomes a particularly acute problem considered.

Load more