<<

c

Metho ds of Psychological Research Online 1997, Vol.2, No.2 1998 Pabst Science Publishers

Internet: http://www.pabst-publishers.de/mpr/

What exactly is random ab out random

e ects?

Matthias Siemer

Abstract

The central question of the pap er is: When should stimuli be treated as

random e ects in random- or mixed-e ects ANOVA? The conceptual start-

ing p oint is the view of signi cance testing in exp erimental research as fol-

lowing the logic of tests. Three designs are discussed: 1

Stimuli nested under treatment conditions, 2 stimuli and treatment condi-

tions crossed, and 3 stimuli and treatment conditions counterbalanced. The

treatment of stimuli as random e ects in the rst two designs is shown to b e

inadequate but is necessary in the third design.

Moreover, the analysis has implications for at least two additional topics: 1

The conceptualization of the signi cance test as a randomization test when

no randomization takes place, e.g., in quasi-exp eriments and correlational

studies, and 2 the validity of aggregating over sub jects and stimuli referred

to as aggregation validity . Aggregation validity has to b e distinguished from

and is shown to b e one asp ect of the concept formerly known

as external validity, alb eit without app eal to some underlying p opulation.

The present pap er starts with the following metho dological question: Under

which circumstances is it reasonable to consider stimuli in exp erimental studies to

b e random samples drawn from an underlying p opulation of stimuli. By implication

these stimuli would have to b e treated as random factors in a mixed- or random-

e ects ANOVA-mo del. The answer to this question has to consider the following

basic question: What is the exact nature of the supp osed of random

e ects? The Discussion fo cuses on the metho dological implications and presupp o-

sitions of this assumption of randomness rather than on technical problems of the

random-e ects mo del, like the distributions of Quasi-F .

1 Arguments for the treatment of stimuli as ran-

dom e ects

It is established metho dological standard in cognitive psychology to treat stimuli {

e.g. words { as a random factor, just like sub jects. Indeed, cognitive psychologists

often carry out two distinct statistical analyses, one with sub jects as random e ects

and stimuli treated as xed e ects and the other one with stimuli as random

e ects and sub jects treated as xed e ects. The rationale for this approachgoes

back to Coleman 1964 and Clark 1973. For instance, Coleman stated that

many studies of verbal b ehavior have little scienti c p oint if their con-

clusions have to be restricted to the sp eci c language materials that

were used in the exp eriment. It has not b een customary, however, to

p erform signi cance tests that p ermit generalization beyond the sp eci c

materials, and thus there is little evidence that such studies could be

Random E ects 140

successfully replicated if a di erent sample of language materials were

used 1964, p. 216, italics added .

This metho dological requirement has since b een extended to other research areas

e.g. Fontenelle, Phillips & Lane, 1985; Hopkins, 1984; Kenny & Smith, 1980;

Richter & Seay, 1987; Santa, Miller, & Shaw, 1979; Wickens & Kepp el, 1983

and the statistical treatment of di erent exp erimenters Bonge, Schuldt, & Harp er,

1992; Scheirer & Geller, 1979. The arguments in favor of this metho dological

requirement are generally not substantially di erent from those of Coleman 1964.

The following paragraph from Fontenelle, Phillips, and Lane 1985 may b e taken

as representing this view:

In conclusion, generalizing research ndings is a ma jor aim of all exp er-

imentation. Although exp erimenters as a rule are very careful to make

sure that their results generalize to the p opulation of sub jects, the prob-

lem of generalizing to the p opulation of stimuli had b een neglected p.

106, italics added .

Before the validity of these arguments can be discussed further, it is necessary

to brie y present the widely held conceptualization of inferential statistics in ex-

p erimental psychology as following the logic of randomization tests rather than

parametric or p opulation based statistics.

2 The signi cance test as randomization test

Recentinterpretations of the signi cance test do not fo cus on it as a to in-

fer to an underlying p opulation { at least not in exp erimental contexts. Rather,

the signi cance test is seen as following the logic of randomization tests cf. Bre-

denkamp, 1980; Edgington, 1995; Erdfelder & Bredenkamp, 1994; Gadenne, 1984;

Hager & Westermann, 1983. In randomization tests, the critical random pro cess

is not to drawa random sample from an underlying p opulation distribution, but

rather to randomly assign sub jects or stimuli to exp erimental conditions. This

is required to secure sto chastic indep endence of treatment and

potential ly confounding variables PCVs. PCVs designate all other factors that

have an e ect on the dep endentvariable/s in the sense of b eing parts of  sucient

conditions for causal in uence on the dep endent variable/s. The latter re ects

the fact that psychological hyp otheses and explanations are essentially incomplete:

They entail only one or few causes as parts of a complete causal network. Moreover,

they are neither necessary nor sucient in themselves but have to b e supplied by

other background- conditions to b e e ective see Siemer, 1993 for a more detailed

discussion of this prop erty of psychological explanations. PCVs b ecome actual ly

confounding variables ACVs inasmuch as the sto chastic indep endence of PCV

and treatment do es not hold. The randomization pro cedure is therefore required

to secure the internal or ceteris-paribus "other things equal" validity of an ex-

p eriment Hager & Westermann, 1983. If the treatment pro cedure itself induces

confounding variables, randomization do es not ensure internal validity, since the

confounding takes place after randomization e.g., Co ok & Campb ell, 1986.

According to this rationale, a randomization test informs ab out the probability

of the empirical in terms of di erences in central tendencies, under the as-

sumption that they are exclusively the result of the random assignment pro cedure,

that is, chance. Most imp ortantly, the distribution of the test is generated

solely on the basis of the sample data.

To summarize, randomization is necessary to secure the internal validityofan

exp eriment that tests causal hyp otheses. Therefore, the randomization test answers

c

MPR{online 1997, Vol.2, No.2 1998 Pabst Science Publishers

Random E ects 141

the question of how likely the results are, under the assumption that the treatment

has no e ect H  and all e ects are exclusively a consequence of the randomization

0

pro cedure. Consequently, the randomization test with appropriate -level protects

from falsely accepting a causal hyp othesis, and consequently increases the internal

validity of an exp eriment. In particular, inferential statistics do not provide the

inductive generalization from a sample to a p opulation, that is, the "external

validity" or exp ected probability to replicate the results of an exp eriment.

Therefore, the question of interest can be rephrased as: Under which circum-

stances do es the treatment of stimuli as random e ects raise the internal validityof

a study? Note that in the present context ANOVA is treated as an approximation

to randomization tests. This is p ossible b ecause Monte-Carlo studies have shown

that randomization tests and their parametric equivalents in most cases lead to very

similar results summarized in, e.g., Bredenkamp, 1980. The main reason for this

approach is that ANOVA is a much more common metho d for the evaluation of

exp erimental designs than randomization tests, at least for the designs considered.

Therefore, from a statistical p ointof view, I will discuss random e ects or mixed

ANOVA mo dels, whereas from a conceptual or metho dological p oint of view, these

ANOVAs will b e treated as approximations to randomization tests.

3 Sub jects as random factors

To illustrate this approach I will discuss a design that requires the treatment of

subjects as random e ects { b oth at a statistical and a conceptual level: A single

factor rep eated measurements design with two treatment levels. It is assumed, that

sequence of the factor levels and sub jects are randomly combined. The resp ective

randomization test informs ab out the probability of empirical di erences,

given the H that these di erences are exclusively the result of the random combi-

0

nation of the sequence of the factor levels and the sub jects. That is, the probability

that the e ect is the result of chance uctuations in the reactions of the sub jects

across the two levels. The notion of "chance uctuation" subsumes the variation

of the in uences of all PCVs at the two p oints of measurement. Consequently, the

distribution of the test statistic is generated by the p ermutation of the combinations

of reactions and treatment levels within sub jects e.g., Edgington, 1995.

In the ANOVA evaluation of this design, sub jects are treated as random ef-

fects in a mixed-mo del approach. This is re ected by the fact that the F -ratio for

the exp ected mean squares of treatment variability is compared to the exp ected

mean squares of the of treatmentby sub jects, consisting of the

2 2

see Table 1. These variance comp onents which are comp onents  and 

e

T Su

confounded in case of only one observation p er stimulus-sub ject combination can

b e considered to b e estimations of the in uence of pure chance uctuations across

the p oints of measurement, since these uctuations will cause di erential treatment

2

e ects. Note however, that the variance comp onent  entails also "true" or

T Su

manifest di erential e ectiveness of the treatment. As a consequence, the exp ected

mean square will overestimate the amount of pure chance uctuations across the

p oints of measurement. Moreover, the hyp othesis tested usually do es not predict

that the treatment has numerically the same e ect on all sub jects and therefore al-

lows for ordinal treatment-by-sub ject interactions. In the present design, however,

chance variation and ordinal treatment-by-sub ject interactions are not separable.

As a result, the mixed-mo del ANOVA resembles the randomization test account

of chance uctuations by comparing treatmentvariance to an upp er b ound estima-

tion of pure chance uctuations across the p oints of measurement including error

2

 in the critical F -value. variance 

e

Based on these arguments, the revised question now is: Under which circum-

c

MPR{online 1997, Vol.2, No.2 1998 Pabst Science Publishers

Random E ects 142

Table 1: Sources of Variance and Exp ected Mean Squares; Rep eated Measurements De-

sign

Source df EMS

2 2 2

T p 1  +  + n

e

T Su T

2 2

Su n 1  + p

e

Su

2 2

T  Su n 1p 1  +  Residual

e

T Su

Note. T p= Treatment; Sun = Sub jects

stances do es the treatment of stimuli as random e ects serve the same purp ose as

the treatment of sub jects in the present design do es? Three di erent design are

distinguished.

4 Stimuli nested under treatment conditions

In this common design, stimuli are nested under treatment conditions. As a result,

b oth factors are confounded. Examples are studies in which di erentword classes

e.g. nouns vs. adjectives serve as treatments. The linear mo del of an individual

score x of sub ject m on stimulus j in treatment condition i is given by Equation

ij m

1.

x =  + + s + p + p + sp + e 1

im i m im

j i mj i mij 

The terms of Equation 1 sp ecify the p otential sources of variability in the

quasi- exp eriment. The term  represents the overall mean and , s , and

i

j i

p are the treatment, stimulus and sub ject e ects, resp ectively. The grand mean

m

and the treatment e ect are designated by Greek letters to indicate that they are

assumed xed; the others are p otentially random. The treatment-by-sub ject inter-

action is expressed as p . The quantity sp is a particular stimulus-by-sub ject

im

mj i

interaction, while e is the random error asso ciated with that particular sub ject-

mij 

stimulus combination in that particular exp eriment. Table 2 shows the resp ective

exp ected mean squares of the ANOVA mo dels for the random- vs. xed-e ects

mo del. In the random-e ects mo del, the mean square of the treatment e ect has an

2

exp ected value that entails the stimulus variance-comp onent . Therefore, the

T

appropriate error term to test the treatment e ect is one that includes this source

of error, resulting in a Quasi-F -ratio e.g., Clark, 1973. In contrast, in the xed-

e ects mo del, variability originating from stimulus variability is not considered.

In order to nd the appropriate mo del one rst has to analyze whether the hy-

p otheses ab out the stimuli have the same nature as those previously discussed, that

is: Do the critical hyp otheses concern the p opulations of stimuli or are they essen-

tially causal? Only if the stimulus hyp otheses are actually hyp otheses ab out p opu-

lations it is necessary and reasonable seek for statistical generalization by means of

random of material.

Even lo oking at the research questions stated by the authors arguing in favor of

the treatment of stimuli as random e ects e.g., Clark, 1973 makes clear, however,

that the critical hyp otheses are far from b eing hyp otheses ab out central tendencies

in clearly de ned and closed p opulations of stimuli. Instead, the hyp otheses concern

the causal properties of certain features of the stimuli. The app eal to p opulations

serves a completely di erent purp ose. Consider, for instance, the following expla-

nation of what Clark 1973 sees as the purp ose of a "" hyp othesis

c

MPR{online 1997, Vol.2, No.2 1998 Pabst Science Publishers

Random E ects 143

Table 2: Sources of Variance and Exp ected Mean Squares; Rep eated Measurements De-

sign with Stimuli Nested Under Treatment-Conditions.

a b

Source df EMS

2 2 2 2 2

T p 1  +  + q + r + qr

e

T Su T

StTSu StT

2 2 2

S t=T pq 1  +  + r

e

StT Su StT 

2 2 2

Su r 1  +  + pq 

e

Su

StTSu

2 2 2

T  Su p 1r 1  +  + q

e

T Su

StTSu

2 2 c

S t=T  Su pp 1r 1  +  Residual

e

StT Su

a. T =Treatmentp, Su = Sub jects n, St = Stimuli q 

b. Variance comp onents typ ed in b old letters are parts of the random { but not

xed { e ects mo del.

2 2

c.  and  can not b e estimated indep endently with only one observation

e

StT Su

p er stimulus-sub ject combination.

i.e., a p opulation hyp othesis ab out aggregates in the context of a comparison of

homographs vs. nonhomographs:

... homographs take longer to recognize than nonhomographs al l other

things being equal. Since it is imp ossible to nd single homograph/nonhomograph

pairs identical in all other p ossible factors { , meaning, word

length, sp elling diculty, and other undetermined factors { it is only

p ossible to test the hyp otheses by lo oking at the central tendencies for

example, the means of homographs versus nonhomographs. Clark,

1973, p. 352, italics added 

This paragraph makes p erfectly clear, that the hyp othesis of interest do es not con-

cern aggregates of stimuli in well de ned p opulations. Rather, the very prop erty

of homography itself is assumed to causally in uence identi cation times. This is

exactly the reason why all other prop erties of the stimuli havetobecontrolled in

order to realize the ceteris-paribus condition and secure internal validity. More

precisely, it is only necessary to control PCVs and not al l other prop erties of the

stimuli, b ecause some of these prop erties haveno causal e ect. Why the ceteris-

paribus condition is assumed to b e true at the level of central tendencies in p opula-

tions, rather than in samples, remains entirely unexplained, however. Indeed, this

is true, only if a very restrictive additional assumption is met.

First of all, how is a conceptualization of the signi cance test as a randomization

test in the present design p ossible at all? Obviously, the stimuli are not randomly

assigned to conditions, to b egin with. One could argue, however, that the test of

signi cance informs ab out the probability of the empirically obtained di erences

in central tendencies, under the assumption that the stimuli come from the same

population. If this assumption is met, random sampling has the same e ects as

random assignment to conditions. The assumption, that the stimuli are drawn from

the same p opulation H  can in turn b e reduced to two underlying assumptions:

0

1. The null hyp othesis is true, that is, the stimuli come from the same p opulation

with resp ect to the treatment.

2. The stimuli come from the same p opulation with resp ect to all PCVs.

The second assumption implies nothing less than the assumption of the validity

of the ceteris-paribus condition at a p opulation level. This assumption b ecomes

necessary, b ecause the ceteris paribus condition is not guaranteed by a randomiza-

tion pro cedure at least with resp ect to the distributions of the PCVs, e.g. Steyer,

c

MPR{online 1997, Vol.2, No.2 1998 Pabst Science Publishers

Random E ects 144

1992. It has thus to b e secured in a di erentway. However, this is a very strong

a-priori assumption. As a result, the signi cance test informs only ab out the prob-

abilityof the data, given the conjunction of the H and the assumption that the

0

ceteris-paribus condition is met at the p opulation level. The internal validity of the

inference is therefore secured, if and only if there exist no systematic di erences

in terms of PCVs b etween the kinds of stimuli in the p opulation, that is, stimulus

category and PCVs are sto chastically indep endent. For a recent approach to falsify

this assumption of unconfoundedness in the p opulation with resp ect to single PCVs

{ referred to as potential confounders { see Steyer, Gabler, and Rucai 1995.

The same line of argument holds { mutatis mutandis { for other quasi-exp erimental

designs or correlational studies in which randomization is not p ossible { at least as

far as causal and not "true p opulation hyp otheses" are concerned see the Conclu-

sions. True p opulation hyp otheses, however, are commonly of limited theoretical

value, since the ma jor aim of basic science lies in the investigation of general causal

mechanisms rather than in the description of incidental and lo cal phenomena.

Usually, one tries to increase the ceteris-paribus validity in these cases by quasi-

exp erimental control techniques like or the inclusion of PCVs either in

the exp erimental design or in the regression or path analysis. However, matching

with resp ect to stimuli seems to be more feasible than with resp ect to sub jects,

b ecause the numb er of PCVs is more restricted. Indeed, if it were p ossible to reach

a "p erfect matching" i.e., with resp ect to all PCVs, or to include all PCVs in

a thus ful lling the so called "closedness" condition, it would

b e p ossible to interpret a regression co ecient or a di erence in central tendencies

unambiguously in a causal manner.

For the present design this means that the stimulus variance that is not evoked

2

 is reduced to zero. As this variance comp onent is exactly by the treatment

St

the term that separates the random-e ects from the xed-e ects mo del, this means

that di erences b etween the two mo dels are reduced inasmuch as a matching of the

stimuli is successful. Furthermore, as blo cking factors e.g., word length are usually

considered in the pro cess of designing a study but not in the subsequent statistical

analysis, the application of the random-e ects mo del results in a smaller actual -

level than the nominal one even if the statistical assumptions of the random-e ects

mo del, like true random sampling, are actually met; cf. Wickens & Kepp el, 1983.

Elimination of extreme material i.e., truncation of the stimulus distributions has

the same e ect e.g., Cohen, 1976.

5 Stimuli and treatment conditions crossed

In this design stimuli and treatment are completely crossed, with sub jects nested

under treatment levels. In other words, the design is within stimuli and between

sub jects. The linear mo del for a single score x of sub ject m on stimulus j in

ij m

treatment condition i is given by Equation 2:

x =  + + s + p + s + sp + e 2

ij m i i ij ij m

mi jmi

In contrast to Equation 1, sub jects instead of stimuli are nested under the

treatment conditions. The resulting variance comp onents of the mo dels with stimuli

treated as xed versus random e ects is presented in Table 3. The main di erence

between the random- and the xed-e ects mo del consists in the variance comp onent

2

, that is, the interaction b etween stimuli and treatment. This interaction is 

T St

included in the exp ected means of the treatment e ect in the random- but not the

xed-e ects mo del. Consequently, the appropriate Quasi-F ratio has to include the

resp ectivevariance comp onent in the error term e.g., Hopkins, 1984. In contrast,

c

MPR{online 1997, Vol.2, No.2 1998 Pabst Science Publishers

Random E ects 145

Table 3: Sources of Variance and Exp ected Mean Squares; Between-Sub jects Design with

Stimuli Crossed with Treatment-Conditions.

a b

Source df EMS

2 2 2 2 2

T p 1  +  + n + q + nq 

e

TSt T

StSu=T S u=T

2 2 2

St q 1  +  + np

e

St

StS u=T

2 2 2

S u=T pn 1  +  + q

e

StSu=T S u=T

2 2 2

T  St p 1q 1  +  + n

e

T St

StS u=T

2 2 c

St  S u=T pq 1n 1  +  Residual

e

StS u=T

a. T =Treatmentp, Su = Sub jects n, St = Stimuli q 

b. Variance comp onents typ ed in b old letters are parts of the random { but not

xed { e ects mo del.

2 2

c.  and  can not b e estimated indep endently with only one observation

e

StSuT 

p er stimulus-sub ject combination.

the xed-e ects mo del resembles mathematically a oneway ANOVA for the aggre-

gated scores of the stimuli, thus ignoring di erential treatment in uences on the

stimuli.

The present design closely resembles the rep eated measurements design with

sub jects as random e ects. In this previously discussed design the treatment-by-

2

sub jects interaction variance comp onent   serves as an estimation of the

T Su

amountofchance uctuations in the reactions of the sub jects { together with the

true or manifest treatment-by-sub jects interaction caused by "real" di erential ef-

fectiveness of the treatment.

In contrast, with resp ect to stimuli this conclusion is not valid. Stimuli { as

opp osed to sub jects which are essentially "op en systems" { are physically identical

at di erent p oints of measurement. As a consequence, it is not p ossible to havea

reasonable notion of chance uctuations with resp ect to stimuli in the same wayas

2

it is for sub jects. Stated di erently: The treatment-by-stimulus interaction  

T St

2

is completely manifest. Consequently,  is an exclusive indicator of whether

T St

the treatment in uences all stimuli in the same way and to the same extent. As

a result, this measure has no implications for the question of internal validity. To

answer the question of whether an observed e ect can be causally related to the

treatmentvariation at all or if it can also b e explained bychance, it is irrelevant

whether this e ect is identical across all stimuli. The latter is a question of exter-

nal or { in analogy to the concept of p opulation- validity Hager & Westermann,

1983 { stimulus validity. To accomplish a high level of stimulus validity one rst

of all has to secure that the stimuli investigated are actually a sample from the

p opulation of stimuli for which the tested hyp othesis demands validity. This as-

sumption is trivially met, if the theory tested do es not restrict the p opulation of

stimuli to some subp opulation. The second asp ect of stimulus validity is the ques-

tion whether the treatment causally in uences all stimuli in the same direction.

That is, whether disordinal interactions exist between treatment and subgroups

of  stimuli. In analogy to the notion of "aptitude-treatmentinteraction", one could

2

sp eak of stimulus-treatment interactions. Only with resp ect to this question 

T St

2

b ecomes imp ortant. Therefore,  should at least be rep orted descriptively.

T St

Additionally, one should investigate whether partitioning of the stimuli { accord-

ing to some control variables { is p ossible and accounts for relevant prop ortions of

2

. Moreover, it might b e helpful to rep ort and compare di erent measures of 

T St

e ect size on the basis of intraclass correlations also denoted as "generalizability"

co ecients, e.g. Cronbach, Gleser, Nanda, & Ra jaratnam, 1972. In e ect, what is

c

MPR{online 1997, Vol.2, No.2 1998 Pabst Science Publishers

Random E ects 146

considered here is whether the treatment has the same e ect on all stimuli, that is,

whether an aggregation across di erent stimuli leads to valid conclusions. If disor-

dinal stimulus-treatmentinteractions exist, aggregation across subgroups of stimuli

is no valid approach see Iseler, 1996a, 1996b for a discussion of the deductive re-

lation of single case and statistical aggregate hyp otheses and the metho dological

and statistical implications. In order to allow for a more di erentiated terminol-

ogy concerning questions of validity in exp erimental research, this second asp ect

of stimulus validity { or p opulation validity for that matter { might be referred

to as aggregation validity. This kind of validity,however, is entirely di erent from

the question of "pure" internal validity, that is, whether the treatment has causal

e ects at al l.

6 Stimuli and treatment conditions counterbalanced

This design results, if stimuli and treatment levels can be combined freely, but

each stimulus can be presented only once. In this design stimuli and treatment

conditions are balanced by building two sets of stimuli and sub jects in case of two

treatment levels and crossing them. The treatment e ect equals the interaction of

the dummy-co ded sub ject A and stimulus B sets cf. Kenny& Smith, 1980.

With resp ect to the two dummyvariables sub jects and stimuli are nested. Equation

3 shows the linear mo del for a single score x .

ij k m

x =  + + + s + p + ij + s + p + p + e

ij k m i j

k j  mi ik j  mj i k j mi mij k 

3

In the counterbalanced design b oth, sub jects as well as stimuli, are randomly

assigned in the present case to two groups, that is, stimuli and sub jects are treated

symmetrically. Consequently, the numb er of the sub jects and the stimuli should b e

equal. If stimuli are treated as random e ects the treatment e ect entails the vari-

2

ance comp onent  , just the same way as the treatment of sub jects as random

B St

2

e ects leads to the inclusion of the variance comp onent  see Table 4. In b oth

ASt

cases this variance comp onent re ects an estimation of the amountofvariance that

can b e attributed to the randomization of sub jects and stimuli, resp ectively. This

re ects the ANOVA approximation of a b etween sub jects randomization test. As a

consequence, from the p ersp ective of the sub jects as well as from that of the stimuli,

an ANOVA in which b oth, sub jects and stimuli, are treated as random e ects is a

valid conceptual approximation of a randomization test in which the distribution of

the test statistic is calculated by p ermutating b oth stimuli and sub jects. Therefore,

inasmuch as tests of signi cance { in the interpretation of b eing randomization tests

{ raise the internal validity of an exp eriment, this is also true for the treatmentof

stimuli as random e ects in the counterbalanced design.

One of the disadvantages of the random-e ects mo del has not yet b een discussed

b ecause the application of the random-e ects mo del did not prove to b e advisable:

The underlying variance comp onents mo del is not as robust against violations of its

statistical assumptions as the xed-e ects mo del, which is analyzable in the con-

text of the General Linear Mo del. For instance, the random-e ects mo del is less

robust against violations of the normality assumption, since the central limit the-

orem is not applicable to variance comp onents, as opp osed to means e.g., Sche e,

1963. Moreover, the distributions of the Quasi-F statistics, the basis for tests of

signi cance, cannot b e derived analytically and haveto be analyzed by means of

Monte-Carlo studies e.g., Maxwell & Bray, 1986; Santa, Miller, & Shaw, 1979.

For example, in the present case, the resulting Quasi-F ratio is given by Equation

4, cf. Kenny and Smith 1980.

c

MPR{online 1997, Vol.2, No.2 1998 Pabst Science Publishers

Random E ects 147

Table 4: Sources of Variance and Exp ected Mean Squares; Counterbalanced Design with

Two Stimulus and Sub jects Groups.

a b

Source df EMS

2 2 2 2 2

A 1  +  + n + qr + qrn

e

StSu ASt Su T

2 2 2 2 2

B 1  +  + r + pn + pr n

e

StSu B Su St B

2 2 2

S t=B 2r=2 1  +  + pn

e

StSu St

2 2 2

Su=A 2n=2 1  +  + qr

e

StSu Su

2 2 2 2 2

A  B=T 1  +  + n + r + rn

e

StSu ASt B Su AB

2 2 2

A  S t=B 2r=2 1  +  + n

e

StSu ASt

2 2 2

B  Su=A 2n=2 1  +  + r

e

StSu B Su

c

2 2

S t=B  Su=A 4n=2 1k=2 1  + 

e

StSu

a. A = Sub ject groups p = 2, B = Stimulus sets q = 2, St = Stimuli r , Su =

Sub jects n

b. Variance comp onents typ ed in b old letters are parts of the random { but not

xed { e ects mo del.

2 2

c.  and  can not be estimated indep endently with only one observation

e

StSu

p er stimulus-sub ject combination.

MS + MS

AX B

SU AxS T B 

0

F = 4

MS + MS

BxSuA A+ST B 

One straightforward waytoavoid such statistical problems is to confound stimuli

and the sub jects factor. Under these circumstances, the xed-e ects mo del can b e

applied. In the present case, however, in order to confound b oth factors it is not

necessary to draw a new random sample of stimuli for each sub ject e.g., Kepp el,

1976; Coleman, 1979; Richter & Seay, 1987. Instead, it is sucient to randomly

assign stimuli to conditions individually for each sub ject. That is, the distribution

of the test statistic is not generated p ost ho c by means of p ermutations, like in the

randomization test, but the p ermutation is actually p erformed individually for each

sub ject.

7 Summary and Conclusions

I will divide the conclusions in two parts. First, I will discuss the metho dological

and statistical conclusions with resp ect to the treatment of stimuli in exp erimental

and quasi-exp erimental studies. Second, I will discuss general implications of the

present analysis for the question of what exactly a sound conceptualization of the

randomness of random variables in psychological research could b e.

7.1 Metho dological Conclusions

The metho dological conclusions for the treatment of stimuli in the three designs

considered are straightforward:

1. If stimuli and treatment are confounded, there is no p oint in treating stimuli

as random e ects, b ecause this approach usually will not have any p ositive

e ect on the internal validity of the study { at least if one do es not have

reasons to b elieve in the unlikely a-priori assumption that treatment and PCVs

c

MPR{online 1997, Vol.2, No.2 1998 Pabst Science Publishers

Random E ects 148

are sto chastically indep endent in the p opulation. Internal validity should b e

soughtby common quasi- exp erimental control techniques, like blo cking and

prop er selection of material. Internal validity can not b e reached by simply

treating stimuli as random e ects. However, the control of PCVs is p ossible

to a higher degree with resp ect to stimuli than with resp ect to sub jects.

2. If stimuli and treatment conditions are crossed { i.e., the design is "within-

stimuli" { there is no sense in treating stimuli as random, b ecause there is no

reasonable notion of "chance uctuations" with resp ect to stimuli. Therefore,

stimulus variability has not to b e controlled by inferential statistics in order

to secure internal validity.

3. If stimuli and treatment conditions are counterbalanced, there is a symmetri-

cal random partitioning of stimuli and sub jects. Consequently, b oth, stimuli

and sub jects, should be treated as random e ects, in order to account for

any e ects caused by the randomization pro cedure itself. Toavoid technical

problems with the handling of Quasi-F statistics, it is advised to confound

stimuli and treatment conditions by randomizing the stimuli individually for

each sub ject. In this case the xed-e ects mo del can b e applied.

7.2 General Conclusions

The conceptual starting p oint of the present analysis was the idea of taking seriously

the interpretation of parametric signi cance tests as approximations of randomiza-

tion tests. Sp eci cally, randomization pro cedures have b een directly related to

variance comp onents in the xed- and random-e ects ANOVA. On the one hand,

randomization tests re ect the logic and purp ose of signi cance testing in exp eri-

mental researchintwo ma jor resp ects: 1 A statistical generalization to an under-

lying p opulation is not intended, in contrast, all conclusion are principally sample

related, and 2 the signi cance test protects from falsely accepting a causal hyp oth-

esis and is therefore supp orting the internal validity of a study. On the other hand,

ANOVA-pro cedures are widely used as a exible mathematical to ol for the analysis

of exp erimental data, allowing for the treatment of di erent sources of variance as

either xed or random. The underlying statistical assumptions concerning the idea

of statistical generalization to a p opulation, however, should not b e confused with

the scienti c aim of exp erimentation in psychology.

To state it somewhat di erently, and randomization tests

re ect two di erent facets of randomness. The rst kind of randomness is realized

by the randomization pro cedure in randomization tests. The randomization test

protects either from errors of randomization caused by the random assignment

itself, or from chance uctuation not accounted for by the randomized sequence

of treatment levels in within-sub jects designs.

The second kind of randomness is expressed in the concept of random sampling

from a p opulation. In the present exp erimental context, this random sampling

is given eo ipso with resp ect to some underlying hypothetical population, that is,

it has not to be assumed that the sample is representative with resp ect to some

underlying actual population. Based on these assumptions, what are the more gen-

eral conclusions, that can be drawn from the present analysis? The rst ma jor

conclusion concerns the status of inferential statistics in quasi-exp eriments or cor-

relational analyses. It has b een shown with resp ect to stimuli that the app eal to an

underlying p opulation { probably supp orted by the statistical assumptions of the

ANOVA pro cedures { has the aim to protect the intended . This

logic requires, however, that the underlying { hyp othetical { stimulus p opulations

do not di er with resp ect to the distributions of PCVs, that is, the validity of

the ceteris-paribus condition has to b e p ostulated on the level of { hyp othetical {

c

MPR{online 1997, Vol.2, No.2 1998 Pabst Science Publishers

Random E ects 149

p opulations. It has b een argued that apart from the fact that PCVs { and their

distributions in hyp othetical p opulations { are principally b etter controllable with

resp ect to stimuli then to sub jects, this reasoning is also true for all other typ es

of studies in which a causal inference is sought, but randomization is not p ossible.

Curiously then, in quasi-exp eriments and correlational analyses, the reference to an

underlying p opulation b ecomes necessary not b ecause an inference to these p opu-

lations is sought but b ecause it is necessary for the manifestation of the internal

validity in the sample. As already indicated, the assumption that treatment and

PCVs are sto chastically indep endentinthe{hyp othetical { p opulations is crucial

in this context. To explicate further, if PCVs and the hyp othetical causes are not

sto chastically indep endent in the p opulation { i.e., PCVs are actually confounding

variables ACVs { the actual probability of falsely rejecting the null-hyp othesis on

the level of the scienti c hyp othesis mightbehigher if the confounding variables

work in the direction of the causal hyp othesis or lower if the confounding variables

work against the causal hyp othesis than the nominal -level.

Unfortunately, the assumption of sto chastic indep endence is very dicult to

prove empirically since most likely not all PCVs are known, to b egin with. This

would require the unreachable ideal of a "complete psychological theory". What

one can do is at rst to argue in favor of the validity of this assumption, based

on substantial theoretical reasoning, and secondly to try to control as much PCVs

as p ossible. If one could in fact match sub jects with resp ect to al l PCVs the

validity of a causal inference would b e guaranteed. Incidentally, this idea of a "p er-

fect matching" quite exactly mirrors similar conceptions in philosophical theories

of probabilistic and causal explanation, like the concept of "ob jectively

homogenous reference classes" in Salmon's 1984 theory of causal explanation. Al-

though one can never reach this "p erfect matching", one de nitely can not leave

this problem to the .

Alternatively, in regression or path analysis the idea of a "p erfect matching" is

re ected by the metho dological requirement to include all causally relevantvariables

{ either alone or in combination { in a path analysis in order to b e able to interpret

paths in a causal fashion the so-called "closedness" condition. This inclusion

of other causally relevant variables { which change or mediate the critical causal

relation { is also to b e found in philosophical conceptions of probabilistic causation

usually referred to as the "screening o " of other conditions e.g., Davis, 1988.

Note, that the conclusions based on sequential testing of variables in this resp ect

i.e., whether they mediate or change the in uence of the hyp othetical cause on the

dep endentvariable are principally dubious since the mediating e ects could take

place in any p ossible combination of other variables. As a result, anyevaluation of

avariable in terms of a causal relation is essentially preliminary if not all relevant

variables are included in the analysis at the same time.

The second ma jor conclusion concerns the concept of external validity and its

relation to b oth kinds of randomness. It has b een shown that it is imp ortant to

conceptually distinguish b etween variability that originates either from the random-

ization pro cedure in case of b etween-sub jects designs or from chance uctuations

across p oints of measurement in case of within-sub jects designs and manifest vari-

ability caused by sub ject/stimulus-treatment interactions. For stimuli all resp ec-

tive variability was argued to be manifest, since there is no reasonable notion of

chance uctuation with resp ect to stimuli. With resp ect to sub jects, b oth sources

of variance are statistically indistinguishable but should nevertheless b e separated

conceptually. Whereas variability originating from the pro cess of randomization or

chance uctuations is a p otential threat to the internal validity of a study, vari-

ability caused by sub ject/stimulus-treatmentinteraction is not. As a consequence,

considering the variability caused by stimulus-treatmentinteractions do es not raise

the internal validity of a study. Rather di erential e ectiveness of the treatment

c

MPR{online 1997, Vol.2, No.2 1998 Pabst Science Publishers

Random E ects 150

is an asp ect of "the concept formerly known as external validity". This asp ect has

b een named aggregation validity AV to set it apart from p opulation or stimulus

validity cf. Hager & Westermann, 1983. AV do es not require the notion of an

underlying p opulation and is therefore exclusively sample related { as is internal

validity. Of course, the evaluation of AV might require of an exp eriment

with di erent-p ossibly selected-samples of sub jects. However, one should distin-

guish between the empirical evaluation of AV and the concept of AV, the latter

clearly needing no app eal to a p opulation. The term AV has b een chosen to empha-

size that its fo cus lies in the evaluation of the validity of aggregation across stimuli

and sub jects.

For metho ds of isolating sub jects for which the treatment has di erential ef-

fectiveness { thus violating the hyp othesized causal law { and who therefore cause

disordinal sub ject-treatmentinteractions, see Iseler 1996b. From the present p er-

sp ective the ma jor aim is to separate variability originating from the randomization

pro cess itself from variability which is caused by manifest sub ject-treatmentinterac-

tion. Inasmuch as there is manifest di erential e ectiveness of the treatment { i.e.,

homogenous subgroups of sub jects with di erent treatment e ects can b e identi ed

{ aggregation across these sub jects is not a valid pro cedure and therefore threatens

the aggregation validity of an exp eriment.

References

[1] Bonge, D. R., Schuldt, W. J., & Harp er, Y. Y. 1992. The exp erimenter-as- xed-

e ects fallacy. The Journal of Psychology, 126, 477-486.

[2] Bredenkamp, J. 1980. Theorie und Planung psychologischer Experimente. [Theory

and design of psychological .] Darmstadt: Steinkop .

[3] Clark, H. H. 1973. The language-as- xed-e ect fallacy: A critique of language statis-

tics in psychological research. Journal of Verbal Learning & Verbal Behavior, 12,

335-359.

[4] Cohen, J. 1976. Random means random. Journal of Verbal Learning and Verbal

Behavior, 15, 261-262.

[5] Coleman, E. B. 1964. Generalizing to a language p opulation. Psychological Reports,

14, 219-226.

2

[6] Coleman, E. B. 1979. Generalization e ects vs. random e ects: Is  a source of

TL

Typ e 1 or Typ e 2 error? Journal of Verbal Learning and Verbal Behavior, 18, 243-256.

[7] Co ok, T. D., & Campb ell, D. T. 1979. Quasi-experimentation. Design and analysis

issues for eld settings. Chicago: Rand McNally.

[8] Cronbach, L. J., Gleser, G. C., Nanda, H., & Ra jaratnam, N. 1972. The depend-

ability of behavioral measurements: Theory of generalizability of scores and pro les.

New York: Wiley.

[9] Davis, W. A. 1988. Probabilistic theories of causation. In J. H. Fetzer Ed., Proba-

bility and causality. Essays in honor of Wesley C. Salmon pp. 133-160. Dordrecht:

Reidel.

[10] Edgington, E. S. 1995. Randomization tests 3rd ed.. New York: Dekker.

[11] Erdfelder, E., & Bredenkamp, J. 1994. Hyp othesenprufung.  [Hyp othesis-testing.] In

W. H. Tack & T. Herrmann Eds., Enzyklopadie der Psychologie. Themenbereich B:

Methodologie und Methoden. Serie I: Forschungsmethoden der Psychologie. Band 1:

Methodologische Grund lagen der Psychologie. pp 604-648 Gottingen: Hogrefe.

[12] Fontenelle, G. A., Phillips, A. P., & Lane, D. M. 1985. Generalizing across stim-

uli as well as sub jects: A neglected asp ect of external validity. Journal of Applied

Psychology, 70, 101-107.

[13] Gadenne, V. 1984. Theorie und Erfahrung in der psychologischen Forschung. [The-

ory and empirical know ledge in psychological research.] Tubingen: Mohr.

c

MPR{online 1997, Vol.2, No.2 1998 Pabst Science Publishers

Random E ects 151

[14] Hager, W., & Westermann, R. 1983. Planung und Auswertung von Exp erimenten.

[Design and evaluation of exp eriments.] In J. Bredenkamp & H. Feger Eds., Hy-

pothesenprufung.  Enzyklodadie der Psychologie, Serie Forschungsmethoden, Bd. 5.

pp. 24-238. Gottingen : Hogrefe.

[15] Hopkins, K. D. 1984. Generalizability theory and exp erimental design: Incongruity

between analysis and inference. American Educational Research Journal, 21, 703-712.

[16] Iseler, A. 1996a. A paradoxical prop ertyof aggregate hyp otheses referring to the

order of . MPR-Online, 14. URL: http://www.hsp.de/MPR/.

[17] Iseler, A. 1996b. Populationsverteilungen von Merkmalen und Geltungsb ereiche in-

dividuenb ezogener Aussagen als Gegenstand der Inferenzstatistik in psychologischen

Untersuchungen. [Population-distributions of features and the of validity of

single-sub ject related statements as a topic of inferential statistics in psychological

research]. Pap er presented at the 40. Kongre der Deutschen Gesellschaft fur Psy-

chologie: Munchen. URL: http://userpage.fu-b erlin.de/iseler/.

[18] Kenny, D. A., & Smith, E. R. 1980. A note on the analysis of designs in which

sub jects receive each stimulus only once. Journal of Experimental ,

16, 497-507.

[19] Kepp el, G. 1976. Words as random variables. Journal of Verbal Learning and Verbal

Behavior, 15, 262-265.

[20] Maxwell, S. E., & Bray, J. H. 1986. Robustness of the Quasi-F statistic to violations

of sphericity. Psychological Bul letin, 99, 416-421.

[21] Richter, M. L., & Seay, M. B. 1987. ANOVA designs with sub jects and stimuli as

random e ects: Applications to prototyp e e ects on recognition memory. Journal of

Personality and Social Psychology, 53, 470-480.

[22] Salmon, W. C. 1984. Scienti c explanation and the causal structure of the world.

Princeton, NJ: Princeton University Press.

[23] Santa, J. L., Miller, J. J., & Shaw, M. L. 1979. Using Quasi-F to prevent alpha

in ation due to stimulus variation. Psychological Bul letin, 86, 37-46.

[24] Sche e, H. 1963. The . New York: Wiley.

[25] Scheirer, C. J., & Geller, S. E. 1979. The analysis of random e ects in mo deling

studies. Child Development, 50, 752-757.0

[26] Siemer, M. 1993. Asp ekte einer vollstandigeren Analyse der Unvollstandigkeit psy-

chologischer Hyp othesen. [Asp ects of a more complete analysis of the incompleteness

of psychological hyp otheses.] In L. Montada Ed., Berichtub  er den 38. Kongre der

Deutschen Gesel lschaft fur  Psychologie, Bd. 2 pp. 728-734. Gottingen: Hogrefe.

[27] Steyer, R. 1992. Theorie kausaler Regressionsmodel le. [The theory of causal

regression-models.] Stuttgart: Fischer.

[28] Steyer, R., Gabler, S., & Rucai, A. A. 1995. Individual causal e ects, average causal

e ects, and unconfoundedness in regression mo dels. In F. Faulbaum & W. Bandilla

Eds., SoftStat '95. Advances in statistical software, 5 pp. 203-210. Stuttgart: Lu-

cius & Lucius.

[29] Wickens, T. D., & Kepp el, G. 1983. On the choice of design and test statistic in

the analysis of exp eriments with sampled materials. Journal of Verbal Learning and

Verbal Behavior, 22, 296-309.

c

MPR{online 1997, Vol.2, No.2 1998 Pabst Science Publishers