University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln

USGS Northern Prairie Wildlife Research Center US Geological Survey

2002

The Importance of Replication in Wildlife Research

Douglas H. Johnson USGS Northern Prairie Wildlife Research Center, [email protected]

Follow this and additional works at: https://digitalcommons.unl.edu/usgsnpwrc

Part of the Other International and Area Studies Commons

Johnson, Douglas H., "The Importance of Replication in Wildlife Research" (2002). USGS Northern Prairie Wildlife Research Center. 228. https://digitalcommons.unl.edu/usgsnpwrc/228

This Article is brought to you for free and open access by the US Geological Survey at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in USGS Northern Prairie Wildlife Research Center by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. InvitedPaper: THE IMPORTANCEOF REPLICATIONIN WILDLIFE RESEARCH

DOUGLASH. JOHNSON,1U.S. GeologicalSurvey, Northern Prairie Wildlife Research Center,Jamestown, ND 58401, USA

Abstract:Wildlife ecology and management studies have been widely criticized for deficiencies in design or analy- sis. Manipulative -with controls, , and replication in space and time-provide power- ful ways of learning about natural systems and establishing causal relationships, but such studies are rare in our field. Observational studies and surveys are more common; they also require appropriate design and analy- sis. More important than the design and analysis of individual studies is metareplication: replication of entire stud- ies. Similar conclusions obtained from studies of the same phenomenon conducted under widely differing condi- tions will give us greater confidence in the generality of those findings than would any single study, however well designed and executed. JOURNALOF WILDLIFEMANAGEMENT 66(4):919-932

Key words: control, , metareplication, , pseudoreplication, randomization, replica- tion, sample survey, .

Wildlife researchers seem to be doing everything assigned to attack, others to defend those articles. wrong. Few of our studies employ the hypotheti- We identified substantial problems in the design, co-deductive approach (Romesburg 1981) or gain analysis, or interpretation in nearly all of those the benefits from strong inference (Platt 1964). influential and highly regarded studies. We continually conduct descriptive studies, rather Despite all our transgressions, we must be than the more effective manipulative studies. We doing something right. We have brought some rarely select study areas at random, and even less species back from the brink of extinction. The often do the animals we study constitute a random bald eagle (Haliaeetus leucocephalus),whooping sample. We continue to commit pseudoreplication crane (Grus americana), Aleutian Canada goose errors (Hurlbert 1984, Heffner et al. 1996). We (Branta canadensis leucopareia), and gray wolf confuse correlation with causation (Eberhardt (Canis lupus) were extremely rare over much or 1970). Frequently we measure the wrong vari- all of their only a few years ago; now they ables such as indices to things we really care are much more common. Many of us had given about (Anderson 2001). And we may measure up on the black-footed ferret (Mustela nigripes) them in the wrong places (convenience ; and California condor (Gymnogypscalifornianus), Anderson 2001). We often apply meaningless species that, while still at risk, appear to be recov- multivariate methods to the results of our studies ering. And we can manage for abundance if we (Rexstad et al. 1988). We test null hypotheses that want to, such as we have done for white-tailed not only are silly but are known to be false (Cher- deer (Odocoileusvirginianus) and mallards (Anas ry 1998,Johnson 1999, Anderson et al. 2000). We platyrhynchos).Recently, Jack Ward Thomas spoke rely on nonparametric methods that are neither of the "tremendous record of success" in our necessary nor appropriate (Johnson 1995). field (Thomas 2000:1). Such problems permeate our field. In my early Why this apparent inconsistency between our years as a hypercritical , I read many error-prone methods and the successes of our articles in TheJournal of WildlifeManagement and profession? I hope to address that question here related journals. In virtually every article, I found by discussing what truly is important in scientific problems-often serious ones-in the methods research. I first discuss causation, then manipula- used to analyze . That experience was repeat- tive experimentation as a powerful way of learn- ed later in a class in evolutionary ecology. During ing about causal mechanisms. The 3 cornerstones that class, we critically reviewed many key papers of experimentation are control, randomization, in evolutionary ecology. Some students were and replication. These features also are integral to observational studies and sample surveys, which are more common in our field. For those I E-mail:[email protected] types of studies especially, I argue that the most 919 920 IMPORTANCEOF REPLICATION* Johnson J. Wildl. Manage. 66(4):2002

important feature is replication. Further, I ex- where Yt(u) is the number of squirrels in woodlot pand this concept to the level of metareplica- u after the treatment, and Y (u) is the number of tion-replication of entire studies-and suggest squirrels in that woodlot if the treatment had not that this is the most reliable method of learning been applied (I follow Rubin [1974] and Holland about the world. It is a natural way of human [1986] here). If the woodlot is logged, then you thinking and is consistent with a Bayesian can observe Y( u) but not Y( u). If the treatment approach to . Metareplication is not applied, then you can observe Y( u) but not allows us to exploit the values of small studies, Yt(u). Thus arises the fundamental problem of each of which individually may be unable to causal inference: you cannot observe the values reach definitive conclusions. Metareplication of Yt(u) and Yc(u) on the same unit. That is, any provides us greater confidence that certain rela- particular woodlot is either logged or not. tionships are general and not specific to the cir- Holland (1986) described 2 solutions to this cumstances that prevailed during a single study. problem. With the first, one has 2 units (u1 and u2, here woodlots) and assumes they are identical. CAUTION ABOUT CAUSATION Then the treatment effect Tis estimated to be The "management" in "wildlife management" implies causality.We believe we can perform some T= Y,(ul)- Y(2), (2) management action that will produce a pre- dictable response by wildlife. Even if the causes where u1 is treated and u2 is not. This approach is cannot be manipulated, it is useful to know the based on the very strong assumption that the 2 mechanisms that determine certain outcomes, woodlots, if not logged, would have the same such as that spring migration of birds is a response number of squirrels, that is, Y( u2) = Yc(ul). That to increasing day length, or that drought reduces assumption is not testable, of course, because 1 the number of wetland basins that contain water. woodlot had been logged. It can be made more The concept of causation is most readily adopt- plausible by the 2 units as closely as pos- ed in the physical , where models of the sible or by believing that the units are identical. behavior of atoms, planets, and other inanimate That latter belief comes more easily to physicists objects are applicable over a wide range of con- thinking about molecules than to ecologists ditions (Barnard 1982) and the controlling factors thinking about woodlots, however. are few (e.g., pressure and temperature are suffi- Holland (1986) termed the other solution sta- cient to determine the volume of a gas). In the tistical. One gets an expected, or average, causal physical sciences, causality implies lawlike neces- effect T over the units in some population: sity. In many fields, however, notions of causality reduce to those of probability, which suggests T= E(Y,- Y), (3) exceptions and lack of regularity. Here, causation that an action "tends to make the conse- where, unlike with the other solution, different quence more likely, not absolutely certain" (Pearl units can be observed. The statistical solution 2000:1). This is so in wildlife ecology because of replaces the causal effect of the treatment on a the multitude of factors that influence a system. specific unit, which is impossible to observe, by For example, liberalizing hunting regulations for a the averagecausal effect in the population, which species tends to increase harvest by hunters. In any is possible to estimate. specific instance, liberalization may not result in This discussion reflects the need for a control, an increased harvest because of other influences something to compare with the treated unit, such as population size of the species, weather which is required for either approach. To follow conditions during the hunting season, and the the statistical approach, we often invoke random- cost of gasoline as it affects hunter activity. ization. If, for example, we are to compare squir- Suppose you want to determine the effect on rel numbers on a treated woodlot and an squirrel abundance of some treatment (= puta- untreated one, we might get led astray if the tive cause), for example, selective logging in a woodlots were of very different size, or if one con- woodlot by removing all trees greater than 45 cm tained more mast trees, or if one was rife with diameter at breast height (dbh). The treatment predators of squirrels and the other was not. One effect on some woodlot can be defined as way-but not the only way-to protect against this possibly misleading outcome is to determine T= Y,(u) - Y(u), at random which woodlot receives the treatment J. Wildl. Manage. 66(4):2002 IMPORTANCEOF REPLICATION* Johnson 921 and which does not. This can be done if the re- per et al. (1996) to estimate the effects of preda- searcher has tight control over the experiment; it tor reduction on prey species. In these studies, is impossible in many "natural experiments" and predators were removed from 1 study area for 3 observational studies. years, while another area served as a control; But even if you select at random a woodlot for after 3 years, the treatments were switched. treatment and another as a control, you still may end chance a woodlot that Correlationversus Causation: up by comparing large the of Mechanisms has numerous mast trees and few predators with Importance a woodlot with opposite characteristics. This It is always useful to have an understanding of leads to the third important criterion for deter- the mechanisms that influence phenomena of mining causation: replication. Repeating the ran- interest and to distinguish causation from corre- domization process and treatments on several lation. We might be able to relate mallard pro- woodlots reduces the chance that woodlots in any duction to precipitation (Boyd 1981), but more group consistently are more favorable to squir- useful is the understanding that precipitation rels. In summary, then, assessing the effect of affects the condition of wetlands where mallards some treatment with a manipulative experiment breed, which in turn influences breeding propen- requires a control, randomization, and replica- sity, clutch size, and survivalof young (Johnson et tion (Fisher 1926). al. 1992). We can have greater confidence in our One might attempt to determine the effect of findings if they are consistent with mechanisms selective logging on squirrels by comparing that are both reasonable and supported by other woodlots that have trees greater than 45 cm dbh evidence. The presence of such mechanisms gives with woodlots that lack such large trees. But such credibility that the correlational smoke may in a comparison is not as definitive as a manipula- fact represent causational fire (Holland 1986). tive experiment. The 2 types of woodlots might Romesburg (1981) argued that causation may be differ in numerous ways, other than the presence invoked if the correlational evidence is accompa- or absence of large trees, that influence squirrel nied by, for example, the elimination of other abundance. If variables that are known or sus- possible causes, demonstration that the correla- pected to be influential are measured, careful sta- tion occurs under a wide variety of circumstances, tistical analysis may account for their effects, but and the existence of a plausible dependence large samples may be necessary, and it is possible between the putative cause and the outcome. In that an important variable went unmeasured. a similar vein, mechanistic models are more use- An ideal design might involve a number of ful than descriptive models for understanding woodlots on which squirrel density is measured systems (Johnson 2001 a, Nichols 2001). both before and after the treatment is applied. IFYOU CAN Then, instead of comparing the density of squir- MANIPULATE, rels on treated versus untreated woodlots, one Manipulative experimentation is a very effec- could compare the change in density (before and tive way to determine causal relationships. One after treatment) between the 2 groups. Crossover poses questions to nature via experiments such as designs also provide a powerful way to reduce the selective logging. By manipulating the system influence of inherent differences among experi- yourself, you reduce the chance that something mental units. Under a crossover design, for a cer- other than your treatment causes the results that tain time period, some units receive treatments are observed. Further, as emphasized by Macnab and other units serve as controls. Then the roles (1983), little can be learned about the dynamics of the units switch: control units receive treat- of systems at equilibrium. Manipulation is helpful ments and formerly treated units are left alone to to understand how the systems respond to serve as controls. An obvious concern with changes. Experimentation also forms the basis of crossover designs is that treatment effects may what has been termed strong inference (Platt persist. One remedy is to have a time period be- 1964), in which alternative hypotheses are tween the 2 phases of the study sufficient to allow devised and crucial experiments are performed treatment effects to dissipate. A crossover design to exclude 1 or more of the hypotheses. would not be appropriate for the squirrel-wood- Wildlife ecologists sometimes face severe diffi- lot example because the effect of logging would culties meeting the needs of control, randomiza- persist for decades, if not longer. Crossover tion, and replication in manipulative experiments. designs were used by Balser et al. (1969) and Tap- Many systems are too large and complex for ecol- 922 IMPORTANCEOF REPLICATION* Johnson J. Wildl. Manage. 66(4):2002

ogists to manipulate (Macnab 1983). Often "treat- intervention. That approach was taken with air- ments"-such as oil spills-are applied by others, quality data by Box and Tiao (1975), who sought and wildlife ecologists are called in to evaluate to determine how ozone levels might have their effects. In such situations, randomization is responded to events such as a change in the for- impossible and replication undesirable. Methods mulation of gasoline. for conducting environmental studies, other than Sometimes it is known that a major treatment will experiments with replications, are available be applied at some particular site such as a dam to (Smith and Sugden 1988, Eberhardt and Thomas be constructed on a river. It may be feasible to 1991); among these are experiments without repli- study that river before as well as after the dam is cations, observational studies, and sample surveys. constructed. That simple before-and-after com- Replication is particularly difficult with experi- parison suffers from the weakness that any change ments at the ecosystem level, which are more com- that occurred coincidental with dam construction, plex but also more meaningful than experiments such as a decrease in precipitation, would be con- at microcosm or mesocosm levels, where replica- founded with changes resulting from the dam, tion is more feasible (Carpenter 1990, 1996; unless the changes were specifically included in Schindler 1998). Experiments lacking replications the model. To account for the effects of other vari- can be, and indeed often have been, analyzed by ables, one can study similar rivers during the same taking multiple of the system and before-and-after period. Ideally, these rivers would treating them as independent replicates. This be similar to and close enough to the treated practice was criticized by Eberhardt (1976) and river so to be equally influenced by other variables Hurlbert (1984), the latter naming it pseudo- but not influenced by the treatment itself. This replication. I address this topic more fully below. design has been called the BACI (before-after, Observational studies lack the critical element control-impact) design (Stewart-Oaten et al. of control by the investigator, although they can 1986, Stewart-Oatenand Bence 2001, Smith 2002) be analyzed similarly to an experimental study and is used for assessing the effects of impacts. (Cochran 1983). One is less certain that the pre- It is difficult for investigators to manipulate sumed treatment actually caused the observed large and complex systems such as ecosystems. response, however. In lieu of controlled experi- But wildlife managers, as well as those who man- mentation, one can (1) reduce the influence of age ecosystems for other objectives such as tim- extraneous effects by restricting the scope of ber production, do so frequently. This disparity inference to situations similar to the one under between investigators and managers led Macnab observation; (2) employ matching, by which (1983) to recommend that management activities treated units are compared with units that were be viewed as experiments that offer opportunities not treated but in other regards are as similar as to learn about large systems. Actions taken for possible to the treated units; or (3) adjust for the management benefits generally lack controls, effects of other variables during analysis, with randomization, and replication; such shortcom- methods such as analysis of (Eber- ings can be remedied by incorporating these fea- hardt and Thomas 1991). tures into the experiment. Key assumptions Longitudinal observational studies, with mea- should be identified and stated as hypotheses, surements taken before and after some treat- rather than treated as facts. The results of man- ment, generally are more informative than cross- agement actions, even if they show no effect, sectional observational studies, in which treated should be measured and reported. and untreated units are studied only after the The adaptive resource management approach treatment (Cox and Wermuth 1996). (Of course, blends the idea of learning about a system with measurements on experimental and control the management of the system (Walters 1986, units before and after treatments are highly desir- Williams et al. 2002). The key notion, which able in experimental studies, as well as observa- moves the concept beyond a "try something and tional studies.) Intervention analysis is a method if it doesn't work try something else" attitude, is used to assess the effect of some distinct treat- that knowledge about the system becomes one of ment (intervention) that has been applied to a the products of the system that is to be optimized. system. The intervention was not assigned by the Sample surveys differ from experiments in that investigator and cannot reasonably be replicated. one endeavors either to estimate some character- One approach is to model the system as a time istic over some domain-such as the number of series and look for changes subsequent to the mallards in the major breeding range in North J. Wildl. Manage. 66(4):2002 IMPORTANCEOF REPLICATION* Johnson 923

America-or to compare variables among chance of being included in the sample. Chances groups-such as the age of hunters com- may be the same for all members, but that is not pared with nonhunters. necessary. At a second level, in a manipulative experiment, randomization means that the treat- CONTROLS ment each unit receives is randomly determined. The term control, confusingly, has at least 3 dif- Randomization makes variation among sample ferent meanings in experimental design. The units, due to variables that are not accounted for, first meaning, which is more general and not act randomly, rather than in some consistent and specifically addressed here, involves the investiga- potentially misleading manner. Randomization tor's role. In a controlled study, the treatment thereby reduces the chance of with (cause) is assigned by the investigator; the study other variables. Instead of controlling for the is an experiment. In an uncontrolled study, the effects of those unaccounted-for variables, ran- treatment is determined to some extent by fac- domization makes them tend to cancel one tors beyond the investigator's control; the study is another out, at least in large samples. In addition, observational (Holland 1986). The second - randomization reduces any intentional or unin- ing, design control, implies that, while some tentional bias of the investigator. It further pro- experimental units receive a treatment, others vides an objective for a (the "controls") do not. The third meaning, sta- test of significance (Barnard 1982). tistical control, means that other variables that While randomizing the assignment of treat- may influence the response are measured so that ments to units is crucial in experimentation, I we may estimate their effects and attempt to elim- suggest that randomization in selecting the units inate them statistically. in an experiment or sample survey is less impor- The major benefit of design control is to pro- tant than control or replication. First, the intend- vide a basis for comparison between treated and ed benefits of randomization apply only concep- untreated units. It reduces the error; our mea- tually. Randomly sampling from a population sured response is likely to reflect only the treat- does not ensure that the resulting sample will ment rather than a variety of other things. Statis- represent that population, only that, if many such tical control usually is less effective in reducing samples are taken, the average will be represen- error and is applied after treatments are applied. tative. But in reality only a single sample is taken, Sometimes strict design controls are not possible. and that single sample may or may not be repre- Intervention analysis and BACI designs can sentative. Randomization does make variation act demonstrate that some variablesmay have changed randomly, rather than systematically. However, subsequent to the intervention or impact, but this property is only conceptual, applying to the one will be less confident from that analysis that notion that samples were repeatedly taken ran- the intervention caused that change. Confidence domly. The single sample that was taken may or will increase if potential confounding variables are may not have properties that appear systematic. measured and their effects are accounted for dur- Randomization ostensibly reduces hidden biases ing analysis-that is, through statistical control. of, or "cheating" by, an investigator. But, if an Controls should be distinguished from refer- investigator wishes to cheat, why not do so but say ence units. The latter are units that represent that randomization was employed (Harville 1975)? some ideal that management actions are intended What Does a to approach. Reference sites are especially useful Sample Really Represent? in restoration ecology, when evaluating the effec- Any sample, even a nonrandom one, can be tiveness of alternative management activities for considered a representative sample from some restoring degraded areas to conditions embodied population, if not the target population. What is in the reference sites (Provencher et al. 2002). the population for which the sample is represen- tative? Extrapolation beyond the area from which RANDOMIZATION any sample was taken requires justification on Randomization can occur at 2 levels. In both nonstatistical bases. For example,, studies of ani- experiments and sample surveys, randomization mal behavior (or physiology) based on only a few means that the objects to be studied are randomly individuals may reasonably be generalized to selected from some population (called a target entire species if the behavior patterns (or physio- population) for which inference is desired. Accord- logical processes) are relatively fixed (i.e., the ingly, each member of that population has some units are homogeneous with respect to that fea- 924 IMPORTANCEOF REPLICATION* Johnson J. Wildl. Manage. 66(4):2002 ture). In , traits that vary more widely, of distributional properties, such as the , such as habitat use of a species or annual survival test statistics, and P-values. But what is indepen- rates, cannot be generalized as well from a sam- dent? Consider, as did Millspaugh et al. (1998), ple of comparable size. Consistency of a feature the assessment of habitat preference of animals among the sampled and unsampled units is more that occur together. If the animals are inextrica- critical than the of a sample. Can bly tied together, such as a mother and her one comfortably draw an inference to a popula- dependent offspring, then the locations of each tion from a sample, even if that sample is non- certainly are not independent. If the animals random? In reality, most useful inferences occur together simply because they favor the same require extrapolation beyond the sampled popu- habitats, Millspaugh et al. (1998) argued that the lation. For example, if we want to predict the con- individual animals are independently making sequences of some action carried out in the habitat choices and thus should be treated as future based on a study conducted in the past, we independent units. Then there are intermediate are extrapolating forward in time. situations, such as a mother and her not-quite- dependent offspring. Ascertaining independence Is Randomization Good? Always is not a simple matter; statistical independence Suppose you want to assess the characteristics can be evaluated only in reference to a specific of vegetation in a 10-ha field. You decide to place data set and a specified model (Hurlbert 1997). 8 quadrats in the field and measure vegetation But what is the problem if data are not indepen- within each of those quadrats. Results from those dent? Suppose you have 100 observations, but only 8 samples will be projected to the entire field. You 50 of them are independent, and for each of those can select the 8 points entirely at random. It is there is another observation that is identical to it. possible that all 8 quadrats will be within the same So the apparent sample size is 100, but only 50 of small area of the field, however, and be very dif- those are independent. If you estimate the average ferent from most of the field. Choosing points at of some characteristic of the individuals, the mean random ensures that, if you repeat the process in fact will be a good estimator. But the standard many times, on average you will have a represen- error will be biased low. And a test , say, for tative sample. But in actuality you have only 1 of comparing the mean of that group with another, the infinitely many possible samples; randomness will be inflated and will tend to reject the null tells you nothing about your particular sample. It hypothesis too often (e.g., Erickson et al. 2001). might be perfectly representative of the entire This is a fundamental problem for any test sta- field, or it might be very deviant. The chance that tistic from an individual study. There are ways to it is representative increases with sample size, so correct for the disparity between the number of the risk of a random sample not being represen- observations and the number of independent tative is especially troublesome in small samples. observations. Dependencies among observations There are methods for taking samples to in- sometimes can be modeled explicitly, such as with crease the chance that they better represent the generalized (Liang and entire field. One method is to stratify, if there is Zeger 1986). Dependencies, such as sampling prior knowledge of some variable likely to relate from clusters of units, often result in overdisper- to the variable of interest. Another method is to sion, in which the sample variance exceeds the take systematic rather than random samples. theoretical value; in such cases certain adjust- Hurlbert (1984) emphasized the importance of ments to the theoretical variance can be made interspersion in experimental design, having (McCullagh and Nelder 1989, Burnham and units well distributed in space; this serves 1 goal Anderson 2002). A similar issue arises with respect of randomization, often more successfully. Such to temporally or spatially correlated observations. balanced designs diminish the errors of an exper- I argue later that problems caused by a lack of iment (Fisher 1971). independence, while affecting inferences from individual studies, are less than What Is and Is It consequential Independence, Necessary? they appear. Randomization provides a basis for probability distributions because the observations in a ran- Independence and the Scope of Inference dom sample generally are statistically indepen- Suppose you are investigating area sensitivity in dent. Independence is a mathematically won- grassland birds. That is, you wonder whether cer- drous property, since it facilitates the definition tain species prefer larger patches of grassland to J. Wildl.Manage. 66(4):2002 IMPORTANCEOF REPLICATION* Johnson 925

smaller patches. Area sensitivity might be mani- than 1 unit. Replication provides 2 benefits. First, fested by reduced densities (not just total abun- it reduces error because an average of indepen- dance) in smaller habitat patches or by the avoid- dent errors tends to be smaller than a single ance of habitat edges (Faaborg et al. 1993,Johnson error. Replication serves to ensure against mak- and Winter 1999, Johnson 2001 b). Avoidance of ing a decision based on a single, possibly unusu- edge means that birds are restricted to the interi- al, outcome of a treatment or of a or portions of a patch, which results in reduced unit. Second, because we have several estimates densities for the patch as a whole. To determine of the same effect, we can estimate the error, as whether certain species are area-sensitive, you the variation in those estimates reflects error. We might compare densities of the species in patch- then can determine whether the value of the treat- es of similar habitat but different size. Alterna- ed units are unusually different from those of the tively, you might examine the locations of birds untreated units. The of that estimate of (let's say nests, but we could consider song perch- error depends on the experimental units having es, etc.) within a habitat patch and determine been drawn randomly; thus, the validity is ajoint whether there is evidence that densities of nests property of randomization and replication. are reduced near edges to interiors. compared Is For comparing densities, the sample units are ReplicationAlways Necessary? patches. Those are the units to which a "treat- Imagine yourself cooking a stew.You want to see ment" (patch size) pertains; all birds in that if it needs salt. You dip a teaspoon into the kettle patch have the same patch size. For examining and take a taste. If it's not salty enough, you add edge avoidance, in contrast, the sample units are more salt. Notice that you did not take replicate nests because each has its own, possibly unique, samples. Only one. (Further, you probably didn't value of the "treatment" (distance to edge). Log- randomly select where in the kettle to sample; ically, then, the latter approach would be more you most certainly took it from the surface and powerful because a single patch might produce most likely near the center of the kettle.) Cooks dozens of sample units (nests), resulting in much have been using this sampling approach for prob- larger sample sizes. ably centuries, without evident problem. Why? Assuming there is no free lunch, what is happen- The single, nonrandomly selected sample gen- ing here? The disparityis the scope of inference. If erally suffices because the stew is fairly homoge- we study densities in patches, the studied patches neous with respect to salt. A teaspoon from 1 can be considered a sample from some target pop- location will be about as salty as a teaspoon from ulation of patches, and inferences should apply to another. This is because the stew has been stirred. that population. If we study nests within a patch Note that the same approach would not work for and examine distances to an edge, the inference sampling meat, which is distributed less uniform- is only to that single patch. You might conclude ly throughout a stew. Replication may not be nec- that birds avoid locating their nests near a habitat essary if all the members of the universe are iden- edge, but that conclusion applies only locally. tical, or nearly enough so. ReplicationIs Necessary for Randomization OTHERLEVELS OF REPLICATION to Be Useful I find it useful to think of replication occurring The properties of randomization in the selec- at 3 different levels (Table 1). The fundamental tion of units to study are largely conceptual; that notion is of ordinary replication in an experi- is, they pertain hypothetically to some long-term ment: treatments are applied independently to average. For example, randomization makes several units. In our squirrel-woodlot example, errors act randomly, rather than in a consistent we would want several woodlots to be logged and direction. But in any single observation, or any several to be left as controls. (Comparable con- single study, the error may well be consistent. It is siderations apply to observational studies or sam- only through replication that long-term proper- ple surveys.) As mentioned above, replication ties hold. serves to ensure against making a decision based on a single, possibly unusual, outcome of the REPLICATION treatment. It also provides an estimate of the vari- Replication requires that a sample consist of ation associated with the treatment. Other levels more than 1 member of a population, or that of replication are pseudoreplication and meta- treatments be applied independently to more replication. 926 IMPORTANCEOF REPLICATION* Johnson J. Wildl. Manage. 66(4):2002

Table1. The types of replicationdiffer in what actions are repeated,what scope of inferenceis valid,and the role of P-values.

Term Repeated action Scope of inference P-value Analysis Pseudoreplication Measurement Objectmeasured Wrong Pseudo-analysis Ordinaryreplication Treatment Objectsfor which samples are representative "OK" Analysis Metareplication Study Situationsfor which studies are representative Irrelevant Meta-analysis

Pseudoreplication averaging the errors, metareplication reduces the influence of errors studies themselves. At a lower level than ordinary replication is what among Youden a classic of Hurlbert (1984) called pseudoreplication. Often (1972) provided example the need for He described the couched in terms (using the metareplication. of 15 studies conducted wrong error term in an analysis), typically it arises sequence during 1895-1961 to estimate the distance be- by repeating measurements on units and treating average tween Earth and the sun. Each scientist obtained such measurements as if they represented inde- an as well as a for pendent observations. The treatments may have estimate, that estimate. estimate obtained was outside been assigned randomly and independently to Every the confidence interval for the estimate! the units, but repeated observations on the same previous unit are not independent. This was what Hurlbert The confidence each investigator had in his esti- (1984) called simple pseudoreplication and what mate thus was severely overrated. The critical mes- Eberhardt (1976) had included in pseudodesign. sage from this saga is that we should have far less Pseudoreplication was common when Hurlbert confidence in any individual study than we are led (1984) surveyed literature on manipulative eco- to believe from internal estimates of reliability. logical experiments, mostly published during This also points out the need to conduct studies 1974-1980, and estimated that about 27% of the of any phenomenon in different circumstances, experiments involved pseudoreplication. Heff- with different methods, and by different investi- ner et al. (1996:2561) found that the of gators. That is, to do metareplication. pseudoreplication in more recent literature Allied to this reasoning is Levins' notion of (1991-1992) had dropped but was still "dis- truth lying at the "intersection of independent turbingly high." Stewart-Oaten (2002) provided lies" (Levins 1966:423). He considered alternative some keys for recognizing pseudoreplication, models, each of which suffered from I or more which is not always straightforward. simplifying assumptions (and all models involve some simplification of the system being mod- Metareplication eled) that made each model unrealistic in some At a higher level than ordinary replication is way or another. He suggested that if the mod- what I term metareplication. Metareplication in- els-despite their differing assumptions-lead to volves the replication of studies,preferably in dif- similar results, we have a robust finding that is ferent years, at different sites, with different relatively free of the details of each model. In the methodologies, and by different investigators. context of metareplication, although indepen- Conducting studies in different years and at dif- dent studies of some phenomenon each may suf- ferent sites reduces the chance that some artifact fer from various shortcomings, if they paint associated with a particular time or place caused substantially similar pictures, we can have confi- the observed results; it should be unlikely that an dence in what we see. unusual set of circumstances would manifest itself The idea of robustness in is analo- several times or, especially, at several sites. Con- gous to robustness among studies. Robustness in ducting studies with different methods similarly the analysis of data from a single study means that reassures us that the results were not simply due the conclusions are not strongly dependent on the to the methods or equipment employed to get assumptions involved in the analysis (Mallows those results. And having more than 1 investigator 1979). Similar inferences would be obtained from perform studies of similar phenomena reduces statisticalmethods that differ in their assumptions. the opportunity for the results to be due to some For example, conclusions might not vary even if hidden bias or characteristic of that researcher. the data do not follow the assumed distribution, Just as replication within individual studies such as the Normal, or if outliers are present in the reduces the influence of errors in observations by data. Analogously, robustness in metareplication J. Wildl.Manage. 66(4):2002 IMPORTANCEOF REPLICATION* Johnson 927 means that similar interpretations about phenom- cance of findings from a study. A small P-value ena are reached from studies that differ in meth- suggests either that the null hypothesis is not true ods, investigators, locations, times, etc. or that an unusual result has occurred. P-values The notion that studies should be replicated often are misinterpreted as: (1) the probability certainly is not new. Replication, in the form of that the results were due to chance, (2) an indi- repetition of key experiments by others, has been cation of the reliability of the result, or (3) the conventional practice in science far longer than probability that the null hypothesis is true (Carv- statistics itself has been (Carpenter 1990). Fisher er 1978, Johnson 1999). Small P-values are taken (1971) observed that conclusions are alwaysprovi- to represent strong evidence that the null sional, like progress reports, interpreting the evi- hypothesis is false, but in reality the connection dence so far accrued. Tukey (1960) proposed that between P and Pr{H0 is trueldatal is nebulous conclusions derive from the assessment of a series (Berger and Sellke 1987). of individual results, rather than a particular re- R. A. Fisher was an early advocate of P-values, sult. Eberhardt and Thomas (1991:57) observed but he actually recommended that they be used that "truly definitive single experiments are very opposite to the way they are mostly used now. rare in any field of endeavor, progress is actually Fisher viewed a significant P-value as providing made through sequences of investigations." Cox reason to continue studying the phenomenon and Wermuth (1996:10) noted that, "Of course, (recalling that either the hypothesis was wrong or deep understanding is unlikely to be achieved by a something unusual happened). In stark contrast, single study, no matter how carefully planned." modern researchers often use nonsignificant P- Hurlbert and White (1993:149) suggested that, values as reason to continue study; many investi- although serious statistical errors were rampant in gators, when faced with nonsignificant results, at least 1 area of ecology, principal conclusions, argue that, "a larger sample size [i.e., further "those concerning phenomena that have been research] is needed to reach significance." studied by several investigators, have been unaf- The of Consistent Methods in fected." And Catchpole (1989:287) stated that, Importance "Most hypotheses are tested, not in the splendid Replication isolation of one finely controlled 'perfect' exper- Scientists are encouraged to replicate studies iment, but in the wider context of a whole series using the same methods as were used in the orig- of experiments and observations. Surely a much inal studies. This practice eliminates variation due more valuable form of validity comes from the to methodology and, if different results are independent repetition of experiments by col- obtained, suggests that the initial results may have leagues in different parts of the world." As sum- been an accident (Table 2). That is, they did not marized by Anderson et al. (2001:312), "In the bear up under metareplication. Obtaining the long run, science is safeguarded by repeated same results when using the same methods, how- studies to ascertain what is real and what is mere- ever, allows for the possibility that the results were ly a spurious result from a single study." specific to the method, rather than a general truth. Replication with different methods is critical to MORE ON METAREPLICATION determine whether results are robust with respect to methodology and not an artifact of the P Do With It? What's Got to methods employed. When we get consistent P-values resulting from statistical tests of null results with different methods, we have greater hypotheses often are used to judge the signifi- confidence in those results; the results are robust

Table2. There are both advantagesand disadvantagesto replicatinga study withthe same or differentmethods as the original study.

Methodsof originaland Resultsfrom original and replicatedstudies replicatedstudies Same Different Same Results may have been specificto method, Originalresults may have been accidental, ratherthan a generaltruth not bearingup undermetareplication Different Results are robustwith respect to method Results may have been an artifactof the methodused 928 IMPORTANCEOF REPLICATION* Johnson J. Wildl.Manage. 66(4):2002 with respect to method. Should we get different was ineffective (Mann 1994). Often studies of results when different methods are used, the comparable effects are analyzed by vote counting: original results may have been artifacts of the of the studies that looked for the effect, this many methods (Table 2). had statistically significant results and that many did not. One problem with the vote-counting What to Do With Surprises? approach is that, if the true effect is not strong A cogent argument has been made that only and sample sizes are not large, most studies will well-thought-out hypotheses should be tested in a not detect the effect. So a critical review of the study. Doing so avoids "fishing expeditions" and studies would conclude that most studies found the chance of claiming that accidental findings no effect, and the effect would be dismissed. are real (Johnson 1981, Rexstad et al. 1988, In contrast, meta-analysis examines the full Anderson et al. 2001, Burnham and Anderson range of estimated effects (not P-values), whether 2002). I think that surprise findings in fact should or not they were individually statistically signifi- be considered, but not as confirmed results from cant. From the resulting pattern may emerge evi- the study so much as prods for further investiga- dence of consistent effects, even if they are small. tions. They generate hypotheses to test. For Mann (1994) cited several instances in which example, suppose you conduct a regression meta-analyses led to dramatically different con- analysis involving many explanatory variables. If clusions than did expert reviews of studies that you use a stepwise procedure to select variables, used vote-counting methods. Meta-analysis does results from that analysis can give very misleading have a serious danger, however, in publication estimates of effect sizes, P-values, and the like bias (Berlin et al. 1989). A study that demon- (Pope and Webster 1972, Hurvich and Tsai 1990). strates an effect at a statistically significant level is Variables deemed to be important may or may more likely to be written for publication, favor- not actually have major influence on the ably reviewed by referees and editors, and ulti- response variable, and conclusions to that effect mately published than is a study without such sig- should not be claimed. It is appropriate, on the nificant effects (Sterling et al. 1995). So the other hand, to use the results in a further investi- published literature on an effect may give a very gation, focusing on the explanatory variables that biased picture of what the research in toto the analysis had suggested were influential. It is demonstrated. (The medical community worries better to conduct a new study (i.e., to metarepli- that ineffective and even harmful medical prac- cate), but at a minimum cross-validation will be tices may be adopted if positive results are more useful. In that approach, a model is developed likely to be published than negative results [Hof- with part of the data set and evaluated on the fert 1998]. Indeed, an on-line journal, the Journal remaining data. This is not to suggest that a pri- of NegativeResults in Biomedicine,is being launched ori hypotheses are not important, or that careful- to correct distortions caused by a general bias ly designed studies to evaluate those hypotheses against null results [Anonymous 2002].) Even if are not a highly appropriate way to conduct sci- results from unpublished studies could be ence. Only that a balance between exploratory accessed, much care would be needed to evaluate and confirmatory research is needed. Studies them. Bailar (1995) observed that quality meta- should be designed to learn something, not mere- analysis requires expertise in the subject matter ly to generate questions for further research. reviewed. A question always looms about unpub- Apparent findings need to be rigorously con- lished studies (Hoffert 1998): Was the study not firmed. If scientists look only at variables known published because it generated no statistically sig- or suspected to be influential, however, how nificant results or because it was flawed in some would we get new findings? way? Further, could it be that the study was not published because it was contrary to the prevail- Meta-analysis ing thinking at the time? Yet, despite the con- Meta-analysis essentially is an analysis of analy- cerns with meta-analysis, it does provide a vehicle ses (Hedges and Olkin 1985, Osenberg et al. for thoughtfully conducting a synthesis of the 1999, Gurevitch and Hedges 2001). The units studies relevant to a particular question. being analyzed are themselves analyses. Meta- analysis dates back to 1904, when Karl Pearson Weak Studies May Be OK, But ... from various military tests and con- , including myself (Johnson 1974), cluded that vaccination against intestinal fever regularly advise against conducting studies that J. Wildl.Manage. 66(4):2002 IMPORTANCEOF REPLICATION* Johnson 929

lack sufficient power. Observations in those stud- is no reason to base management recommenda- ies are too few to yield a high probability of reject- tions solely on that single study. Recommenda- ing some null hypothesis, even if the hypothesis is tions should be based on a larger body of knowl- false. While large samples are certainly preferable edge. Similarly,manuscripts should be considered to small samples, I no longer believe that it is for publication even if they are not "ground- appropriate to condemn studies with small sam- breaking," but instead provide support for infer- ples. Indeed, it may be preferable to have the re- ences originally obtained from previous studies. sults of numerous small but well-designed studies What about "management studies"?These seem rather than results from a single "definitive"inves- to be studies conducted by others than scientists tigation. This is so because the single study, despite or graduate students. They also are claimed to be large samples, may have been compromised by an in less need of quality (good design, adequate unusual happenstance or by the effect of a "lurk- sample size, etc.) than are "research studies." I ing variable" (a third variable that induces a cor- would argue that the reverse may in fact be true: relation between 2 other variables that are other- Management studies should at least equal re- wise unrelated). Numerous small studies, due to search investigations in quality. If an erroneous the benefits of metareplication, are less at jeop- conclusion is reached in a research study, the ardy of yielding misleading conclusions. only negative consequence is the publication of One danger of a small study is that the sampled that error in a journal. And, hopefully, further units do not adequately represent the target popu- investigation will demonstrate that the published lation. Lack of representation also can plague larg- conclusion was unwarranted. In contrast, an erro- er studies, however. I suspect that the greatest dan- neous conclusion reached in a management ger of a small study is the tendency to accept the study may well lead to some very inappropriate null hypothesis as truth, if it is not rejected. Con- management action being taken, with negative cluding that a hypothesis is true simply because it consequences to wildlife and their habitats. was not in a statisticaltest is Nonethe- rejected folly. and the less, it is done frequently; Johnson (1999, 2002) Metareplication BayesianApproach cited numerous instances in which authors of The The Bayesian philosophy offers a more natural Journal of WildlifeManagement articles concluded way to think about metareplication than does the that null hypotheses were true, even when samples traditional (frequentist) approach. In concept, a were small and test statisticswere nearly significant. frequentist considers only the likelihood func- Metareplication protects against situations in tion, acting as if the only information about a vari- which there is an effect, but it is small and there- able under investigation derives from the study at fore not statistically significant in individual stud- hand. A Bayesian accounts for the context and ies, and thus is never claimed. Hence, small stud- history more explicitly by considering the likeli- ies should not be discouraged, as long as the hood in conjunction with the prior distribution. investigators acknowledge that they are not defin- The prior incorporates what is known or believed itive. Studies should be designed to address the about the variable before the study was conduct- topic as effectively and efficiently as possible. If ed. I think people naturally tend to be Bayesians. the scope has to be narrow and the scale has to be They have what might be termed mental inertia: small, or if logistic constraints preclude large sam- they tend to continue in their existing beliefs ples, results still may be worthwhile and should even in the face of evidence against those beliefs. be published, with their limitations acknowl- Only with repeated doses of new evidence do edged. Without meta-analysis or a similar strate- they change their opinions. Sterne and Smith gy, any values of small studies will not be realized. (2001) suggested that the public, by being cynical about the results of new medical studies, were Should AuthorsAvoid Management exhibiting a subconscious Bayesianism. Recommendations? CONCLUSIONS This journal encourages authors to present management implications deriving from the stud- Any imaginative wildlife biologist can easily list a ies they describe. That practice may not always be dozen or more variables that could influence a appropriate. Results from a single study, unless response variable of interest, be it the number of supported by evidence from other studies, may be squirrels in a woodlot, the nest success rate of misleading. The fact that a study is the only one bobolinks (Dolichonyxoryzivorus) in a field, or the dealing with a certain species in a particular state survival rate of mallards in a particular year. An 930 IMPORTANCEOF REPLICATION* Johnson J. Wildl.Manage. 66(4):2002

investigation of such a response variable will ade- those under which they do not. That approach quately determine the influences of only a few of will lead to understanding the mechanisms. the multitude of explanatory variables. The If, indeed, most individual wildlife studies are remainder will not be under the investigator's flawed to some degree, why have we any confi- control and indeed may not even be known to the dence whatsoever in the science? Perhaps the investigator, or may be known but not measured. errors are inconsequential. Or, possibly we don't The extent to which these other variables influ- really believe in those single studies anyway, and ence the response variable confounds the ob- don't take action until a clear pattern emerges served relationship between the response vari- from disparate studies of the phenomenon. Our able and the explanatory variables under study. innate Bayesianism may be weighting results In addition, those unknown influences may from an individual study with our prior thinking, restrict the scope of inference for the relation- based on other things we know or believe. ships that are discovered. To conclude, we certainly should use the best Consider again our example of estimating the statistical methods appropriate for a given data effect on squirrel density of selectively logging set to maximize the value of those data. However, woodlots. Suppose that, in general, such logging as Hurlbert (1994:495) wisely noted, "lack of does reduce squirrel density. In any particular sit- understanding of basic principles and simple uation, however, that result might not follow methods by practising ecologists is a serious prob- because of the effects of other (possibly unmea- lem, while under-use of advanced statistical meth- sured) variables. Predators of squirrels in a logged ods is not." More important than the methods woodlot might have been reduced, offsetting any used to analyze data, we should collect the best population decline associated with logging. Or data we can. We should use the principles of an outbreak of disease in the squirrels might have design-controls, randomization, and replica- reduced their numbers in the unlogged woodlot, tion in manipulative experiments; matching and erasing any difference between that woodlot and measuring appropriate covariates in observation- the one that was logged. al studies. And, most critically, studies themselves Design control (restricting the range in varia- need to be replicated to have confidence in the tion of potentially confounding variables) findings and their generality. Metareplication reduces the influence of such variables, but that exploits the value of small studies, obviates con- practice is not always feasible. Randomization cerns about P-values and similar issues, protects tends to make variables that are not studied act, against claiming spurious effects to be real, and well, randomly, rather than in some consistent facilitates the detection of small effects that are direction. With replication, those variables then likely to be missed in individual studies. contribute to variance in the observed relation- ship, rather than a bias. Nonetheless, in any sin- ACKNOWLEDGMENTS gle study, those unobserved relationships may I am grateful to the many statisticians and biol- give us a misleading impression of the true rela- ogists who have taught me so much as teachers, tionship between the response variable and the colleagues, or students. B. R. Euliss helped pre- explanatory variables under study. pare the manuscript. Special thanks for valuable Metareplication provides us greater confidence comments on this manuscript to D. R. Anderson, that certain relationships are general. Obtaining L. A. Brennan, K. P. Burnham, L. L. Eberhardt, consistent inferences from studies conducted S. H. Hurlbert, W. P. Kuvlesky,Jr., J. D. Nichols, under a wide variety of conditions will assure us M. R. Riggs, G. A. Sargeant, A. Stewart-Oaten, that the conclusions are not unique to the partic- and an anonymous referee, all of whom nonethe- ular set of circumstances that prevailed during the less deserve to be held blameless for remaining study. Further, by metareplicating studies, we faults. Thanks also to L. A. Brennan for the invi- need not worry about P-values,issues of what con- tation to write the paper. stitute independent observations, and other con- cerns involving single studies. We can take a broad- LITERATURECITED er look, seeking consistency of effects among ANDERSON, D. R. 2001. The need to get the basics right in wildlife field studies. Consistent results suggest generality of studies. Wildlife Society Bulletin the will lead us either 29:1294-1297. relationship. Inconsistency K. P. BuRNHAM,W. R. GOULD, AND S. CHERRY. not to the results as truth or to determine accept 2001. Concerns about finding effects that are actually conditions under which the results hold and spurious. Wildlife Society Bulletin 29:311-316. J. Wildl. Manage. 66(4):2002 IMPORTANCEOF REPLICATION * Johnson 931

, AND W. L. THOMPSON.2000. Null perspective for managers. Pages 331-338 in D. M. hypothesis testing: problems, prevalence, and an Finch and P. W. Stangel, editors. Status and manage- alternative. Journal of Wildlife Management ment of neotropical migratory birds. U.S. Forest Ser- 64:912-923. vice General Technical Report RM-229. ANONYMOUS.2002. Getting null results into print. Sci- FISHER,R. A. 1926. The arrangement of field experiments. ence 296:2137. Journal of the Ministry of Agriculture 33:503-513. BAILAR,J. C., III. 1995. The practice of meta-analysis. 1971. The . Hafner, New Journal of Clinical 48:149-157. York, USA. BALSER,D. S., H. H. DILL,AND H. K. NELSON.1968. Effect GUREVITCH,J. A., ANDL. V. HEDGES.2001. Meta-analysis: of predator reduction on waterfowl nesting success. combining the results of independent experiments. Journal of Wildlife Management 32:669-682. Pages 347-369 in S. M. Scheiner andJ. Gurevitch, edi- BARNARD,G. A. 1982. Causation. Pages 387-389 in S. tors. Design and analysis of ecological experiments. Kotz and N. L. Johnson, editors. Encyclopedia of sta- Second edition. Oxford University Press, Oxford, tistical sciences. Volume 1. Wiley, New York, USA. United Kingdom. BERGER,J. O., ANDT. SELLKE.1987. Testing a point null HARVILLE,D. A. 1975. Experimental randomization: hypothesis: the irreconcilability of P values and evi- Who needs it? American Statistician 29:27-31. dence. Journal of the American Statistical Association HEDGES,L. V., ANDI. OLKIN.1985. Statistical methods for 82:112-122. meta-analysis. Academic Press, San Diego, California, BERLIN,J. A., C. B. BEGG,AND T. A. LOUIS.1989. An USA. assessment of publication bias using a sample of pub- HEFFNER, R. A., M.J. BUTLER, IV, AND C. K. REILLY.1996. lished clinical trials. Journal of the American Statisti- Pseudoreplication revisited. Ecology 77:2558-2562. cal Association 84:381-392. HOFFERT,S. P. 1998. Efforts increase to boost validity of Box, G. E. P., ANDG. C. TIAO.1975. Intervention analy- meta-analyses. Scientist 12:7-8. sis with applications to economic and environmental HOLLAND,P. W. 1986. Statistics and causal inference. problems. Journal of the American Statistical Associa- Journal of the American Statistical Association tion 70:70-79. 81:945-960. BoYD, H. 1981. Prairie dabbling ducks, 1941-1990. HURLBERT,S. H. 1984. Pseudoreplication and the design Canadian Wildlife Service Progress Notes 119, of ecological field experiments. Ecological Mono- Ottawa, Ontario, Canada. graphs 54:187-211. BURNHAM,K. P., ANDD. R. ANDERSON.2002. Model selec- 1994. Old shibboleths and new syntheses. tion and multi-model inference: a practical informa- Trends in Ecology & Evolution 9:495-496. tion-theoretic approach. Second edition. Springer, 1997. Experiments in ecology [book review]. New York, USA. Endeavour 21:172-173. CARPENTER, S. R. 1990. Large-scale perturbations: , ANDM. D. WHITE.1993. Experiments with fresh- opportunities for innovation. Ecology 71:2038-2043. water invertebrate zooplanktivores: quality of statisti- .1996. Microcosm experiments have limited rel- cal analyses. Bulletin of Marine Science 53:128-153. evance for community and ecosystem ecology. Ecolo- HURVICH,C. M., ANDC.-L. TSAI. 1990. The impact of gy 77:677-680. on inference in . CARVER,R. P. 1978. The case against statistical signifi- American Statistician 44:214-217. cance testing. Harvard Educational Review 48:378-399. JOHNSON,D. H. 1974. Estimating survival rates from CATCHPOLE, C. K. 1989. Pseudoreplication and external banding of adult and juvenile birds. Journal of Wild- validity: playback experiments in avian bioacoustics. life Management 38:290-297. Trends in Ecology & Evolution 4:286-287. . 1981. The use and misuse of statistics in wildlife CHERRY,S. 1998. Statistical tests in publications of The habitat studies. Pages 11-19 in D. E. Capen, editor. Wildlife Society. Wildlife Society Bulletin 26:947-953. The use of in studies of wildlife COCHRAN,W. G. 1983. Planning and analysis of observa- habitat. U.S. Forest Service General Technical Report tional studies. Wiley, New York, USA. RM-87. Cox, D. R., ANDN. WERMUTH.1996. Multivariate depen- . 1995. Statistical sirens: the allure of nonpara- dencies-models, analysis and interpretation. Chap- metrics. Ecology 76:1998-2000. man & Hall, London, United Kingdom. . 1999. The insignificance of statistical signifi- EBERHARDT,L. L. 1970. Correlation, regression, and cance testing. Journal of Wildlife Management density dependence. Ecology 51:306-310. 63:763-772. .1976. Quantitative ecology and impact assess- .2001 a. Validating and evaluating models. Pages ment. Journal of Environmental Management 105-119 in T. M. Shenk and A. B. Franklin, editors. 4:27-70. Modeling in natural resource management: develop- , ANDJ. M. THOMAS.1991. Designing environ- ment, interpretation, and application. Island Press, mental field studies. Ecological Monographs Washington, D.C., USA. 61:53-73. . 2001 b. Habitat fragmentation effects on birds in ERICKSON,W. P., T. L. MCDONALD,K. G. GEROW,S. grasslands and wetlands: a critique of our knowledge. HOWLIN,ANDJ. W. KERN.2001. Statistical issues in re- Great Plains Research 11:211-231. source selection studies with radio-marked animals. . 2002. The role of hypothesis testing in wildlife Pages 209-242 inJ. J. Millspaugh andJ. M. Marzluff, science. Journal of Wildlife Management 66:272-276. editors. Radio tracking and animal populations. Aca- ,J. D. NICHOLS,AND M. D. SCHWARTZ.1992. Popu- demic Press, San Diego, California, USA. lation dynamics of breeding waterfowl. Pages 446-485 FAABORG,J., M. BRITTINGHAM, T. DONOVAN, ANDJ. BLAKE. in B. D. J. Batt, A. D. Afton, M. G. Anderson, C. D. 1993. Habitat fragmentation in the temperate zone: a Ankney, D. H.Johnson,J. A. Kadlec, and G. L. Krapu, 932 IMPORTANCEOF REPLICATION * Johnson J. Wildl. Manage. 66(4):2002

editors. Ecology and management of breeding water- able knowledge. Journal of Wildlife Management fowl. University of Minnesota Press, Minneapolis, USA. 45:293-313. , ANDM. WINTER.1999. Reserve design for grass- RUBIN,D. B. 1974. Estimating causal effects of treat- lands: considerations for bird populations. Proceed- ments in randomized and nonrandomized studies. ings of the George Wright Society Biennial Confer- Journal of Educational Psychology 66:688-701. ence 10:391-396. SCHINDLER, D. W. 1998. Replication versus realism: the LEVINS,R. 1966. The strategy of model building in pop- need for ecosystem-scale experiments. Ecosystems ulation biology. American Scientist 54:421-431. 1:323-334. LIANG,R. I., ANDS. L. ZEGER.1986. Longitudinal data SMITH,E. P. 2002. BACI design. Pages 141-148 in A. H. analysis using generalized linear models. Biometrika El-Shaarawiand W. W. Piegorsch, editors. Encyclope- 73:13-22. dia of environmetrics. Volume 1. Wiley, Chichester, MACNAB,J. 1983. Wildlife management as scientific United Kingdom. experimentation. Wildlife Society Bulletin 11:397-401. SMITH,T. M. F., ANDR. A. SUGDEN.1988. Sampling and MALLOWS,C. L. 1979. Robust methods-some examples assignment mechanisms in experiments, surveys and of their use. American Statistician 33:179-184. observational studies. International Statistical Review MANN,C. C. 1994. Can meta-analysis make policy? Sci- 56:165-180. ence 266:960-962. STERLING,T. D., W. L. ROSENBAUM,AND J. J. WEINKAM. MCCULLAGH,P., ANDJ. A. NELDER.1989. Generalized lin- 1995. Publication decisions revisited: the effect of the ear models. Second edition. Chapman & Hall, Lon- outcome of statistical tests on the decision to publish don, United Kingdom. and vice versa. American Statistician 49:108-112. MILLSPAUGH,J. J., J. R. SKALSKI,B. J. KERNOHAN,K. J. STERNE,J. A. C., ANDG. D. SMITH.2001. Sifting the evi- RAEDEKE,G. C. BRUNDIGE,AND A. B. COOPER.1998. Some dence-what's wrong with significance tests? British comments on spatial independence in studies of re- Medical Journal 322:226-231. source selection. Wildlife Society Bulletin 26:232-236. STEWART-OATEN,A. 2002. Pseudo-replication. Pages NICHOLS,J. D. 2001. Using models in the conduct of sci- 1642-1646 in A. H. El-Shaarawiand W. W. Piegorsch, ence and management of natural resources. Pages editors. Encyclopedia of environmetrics. Volume 3. 11-34 in T. M. Shenk and A. B. Franklin, editors. Wiley, Chichester, United Kingdom. Modeling in natural resource management: develop- , ANDJ. R. BENCE.2001. Temporal and spatial ment, interpretation, and application. Island Press, variation in environmental impact assessment. Eco- Washington, D.C., USA. logical Monographs 71:305-339. OSENBERG,C. W., O. SARNELLE,AND D. E. GOLDBERG,edi- , W. W. MURDOCH,AND K. R. PARKER.1986. Envi- tors. 1999. Meta-analysis in ecology: concepts, statis- ronmental impact assessment: "pseudoreplication" in tics, and applications. Ecology 80:1103-1167. time? Ecology 67:929-940. PEARL,J. 2000. Causality: models, reasoning, and infer- TAPPER,S. C., G. R. POTTS,AND M. H. BROCKLESS.1996. ence. Cambridge University Press, Cambridge, Unit- The effect of an experimental reduction in predation ed Kingdom. pressure on the breeding success and population PLATT,J.R. 1964. Strong inference. Science 146:347-353. density of grey partridges Perdixperdix. Journal of Ani- POPE,P. T., ANDJ.T. WEBSTER.1972. The use of an Fsta- mal Ecology 33:965-978. tistic in stepwise regression procedures. Technomet- THOMAS,J. W. 2000. From managing a deer herd to rics 14:327-340. moving a mountain-one Pilgrim's progress. Journal PROVENCHER,L., N. M. GOBRIS,L. A. BRENNAN,D. R. of Wildlife Management 64:1-10. GORDON,AND J. L. HARDESTY.2002. Breeding bird TuKEY,J. W. 1960. Conclusions vs decisions. Techno- response to midstory hardwood reduction in Florida metrics 2:423-433. sandhill longleaf pine forests. Journal of Wildlife WAITERS,C. 1986. Adaptive management of renewable Management 66:641-661. resources. Macmillan, New York, USA. REXSTAD,E. A., D. D. MILLER,C. H. FLATHER,E. M. WILLIAMS,B. K.,J. D. NICHOLS,AND M.J. CONROY.2002. ANDERSON,J. W. HUPP,AND D. R. ANDERSON.1988. Analysis and management of animal populations: Questionable multivariate statistical inference in wild- modeling, estimation, and decision making. Academ- life habitat and community studies. Journal of Wild- ic Press, San Diego, California, USA. life Management 52:794-798. YOUDEN,W. J. 1972. Enduring values. Technometrics ROMESBURG,H. C. 1981. Wildlife science: gaining reli- 14:1-11.