<<

II. Principles of Experimental

A. Experimental Units

The experimental unit is that object to which a treatment or condition is independently applied.

EXAMPLE Carcinogenic substances 20 rats are randomly assigned to each of 4 doses of a potential carcinogen: none, low, medium, and high. The rats are kept in individual cages under the same environmental conditions in the same room. Each rat has its assigned dose stirred into its daily meal for 4 weeks. The number of tumors found in each rat is recorded at the end of the 4 week period. What is the experimental unit?

13

The rats are also called the measurement units or observational units because the outcome of interest is measured on each rat.

EXAMPLE Carcinogenic substances 20 rats are randomly assigned to 4 cages (5 rats in each). Each cage is then randomly assigned to one dose of a potential carcinogen: none, low, medium, and high. The rats are kept in their assigned cages under the same environmental conditions in the same room. Rats are not fed individually; food is placed in each cage in a common dish out of which all rats eat. Each cage has its assigned dose stirred into the food for 4 weeks. The number of tumors found in each rat is recorded at the end of the 4 week period. What is the experimental unit? What is the measurement unit?

14 B. Experimental Factors

If a study is an experimental design, then there are one or more experimental factors : covariates whose levels or values are assigned by the investigator to each experimental unit. Observational studies, and some experimental , have classification factors : covariates whose levels or values are observed for each experimental unit by the investigator

Ex . Diabetes study

Treatment - Gender œ

Ex . Midwest Heart Study

Education - Race/ethnicity -

15 C. Experimental factors which correspond to treatments

A placebo is an inactive treatment. It is constructed to look, feel, or taste as much like the active treatment under study as possible.

Ex . Surgery for knee pain Active treatment: incision in the knee followed by a cleaning of the knee joint using sterile saline solution Placebo: incision in the knee followed by sound effects meant to mimic those made during a cleaning

Ex . HIV/AIDS medications Active treatment: 2 drugs in pill form œ ZDV and ddI Placebo: 2 drugs in pill form œ ZDV and an inactive pill that looks and tastes like ddI.

16 - A placebo is used so that any one study participant is not likely to know which treatment they are receiving. If every subject is not informed of which treatment they are receiving, the study is called single-blinded . The subject is —blind“ to treatment assignment.

- If the investigator who is measuring the outcome of interest is also —blind“ to treatment assignment for every participant, the study is called double-blinded .

- If there is a who is running interim analyses (analyses carried out to compare groups before the study is completed), and that person is also —blind“ to treatment assignment for every participant, the study is called triple-blinded .

17 - A placebo is a type of control treatment, a baseline or benchmark to which the active treatment will be compared. No treatment at all (null treatment or null control ) is sometimes used in addition to or instead of a placebo.

Ex . Study of gum treatment Treatment: oral rinse with an active drug Active control (placebo): oral rinse with no active drug Null control: no oral rinse

18 D. Study types

A cross-sectional study collects on all experimental units at only one point in time.

A or follow-up study collects data for at least two points in time.

A prospective study determines the experimental units at the current time and then collects data on them forward in time.

A retrospective study determines the experimental units at the current time and then collects data on them from past records.

19 A collects health outcomes (clinical data) on humans and follows them forward in time for changes in that health outcome or for occurrence of new health outcomes.

When clinical trials recruit and follow participants at multiple locations, then it is called a multi-center clinical trial .

See PubH 7420 Clinical Trials

PubH 7400 (section 003) for Translational and

Clinical Research

20 E. Types of outcomes

- Many measure multiple outcomes or responses. One of them is usually selected as the primary outcome , the outcome that is of most scientific interest, and information on this outcome may be used to calculate how many experimental units are needed to run the research study.

- Other outcomes, those needed to answer other questions of scientific interest, are sometimes called the secondary outcomes .

21 - Sometimes it is impractical to measure the outcome of primary interest (e.g., death) because the outcome is rare, or difficult or impossible to measure. Researchers may then use a surrogate outcome , a response that is supposed to be related to (predictive of) the outcome of interest.

Ex . HIV/AIDS studies often use:

22 F. Sources of Errors

When talk about errors , we are referring to observed variability in the recorded value of an outcome across experimental units.

One source of error is assumed to always exist: experimental error . This describes the variability in the outcome among identically and independently treated experimental units.

Experimental error can arise from:

• natural or inherent differences between experimental units • variability in the measurement process

23 • inability to exactly replicate the treatment or condition from one experimental unit to the next • a treatment that acts differently across units • extraneous factors that influence the measurements

How can we minimize these problems?

• Select experimental units which are as homogenous as possible.

When this is difficult to do, or when doing so would constrain the generalizability of the research results, consider designs. We will see these in Chapter 8.

24 • Take measurements in as uniform a manner as possible. Be aware

of how the following can introduce variability that is irrelevant to

the purpose of the study:

o person-to-person differences in how measurements are taken

(people with more vs. less experience, people with better vs.

poorer training in measuring, people with more vs. less

patience, etc.)

o machine-to-machine differences in how measurements are

recorded (calibration, age or condition of the machine, etc.)

o differences in equipment (pipettes, calipers, microtitre plates,

rulers, etc.)

25 o differences in techniques (assays, dilutions, solution, survey

questions, etc.)

o within person differences (how tired was the student collecting

the measurements?)

o differences in supplies (salines, soils, chemicals, drugs, growth

media, etc.)

• Construct the treatments(s) in as uniform a manner as possible.

• Control the environment to be as uniform as possible

(temperature, humidity, light, accessibility of clinics, financial

incentives, other incentives, etc.)

26 Other variability can be due to specific sources:

• Lab studies using mice should track which animals are related to

each other, as should human studies (genetic variability).

• Crop studies should determine the physical attributes of the soil.

Is one half of the field always wetter than the other half because

the field has a downward slope? (physical variability)

• Studies that take time to complete (due to e.g. lab or recruitment

constraints) should keep track of the days/weeks/months on which

measurements are collected. (temporal, seasonal variability)

(Those recruited in summer, for example, may somehow be

different from those recruited in winter.)

27 • Studies involving repeated measurements on each experimental

unit need to keep track of which measurements came from which

unit (within unit variability).

Such sources of variability can be controlled and then measured using blocking designs as well (Chapter 8). Alternatively, statistical control of such factors may be achieved by adjusting for covariates in the (e.g., season). We will see this in Chapter 17.

28 G.

Use of replication in an that we assign several experimental units to each treatment group. Why is replication important?

• It allows the researcher to demonstrate : the

treatment shows similar results in each of several experimental

units.

• It guards against an experiment failing merely because one or a

few experimental units become unusable (e.g., mice died, plants

died, people dropped out, etc.).

29 • It allows us to measure experimental error.

• It improves the precision with which we can measure the

treatment effect.

A critical question in experimental design is, how many replications should be used? We will come back to this question in several lectures during this course.

30 Pseudo-replications are different. These are repeated measurements take from the same experimental unit. This is sometimes called sub- . Such measurements are not independent of each other, and statistical analysis of such data must account for the correlation. We‘ll see this in several chapters of the book.

Also consider PubH 7430 Methods for Correlated Data.

31 H. : Why do we do it?

Randomization is a critical component in the design of experiments. It involves assigning subjects to treatment groups so that all possible assignments are, in theory, equally likely . Experimental units should have an equal chance of being assigned to any of the treatment groups.

Why do we randomize?

32 • To protect against . Confounding occurs when the

effect of one factor cannot be distinguished from the effect of

another factor.

Obvious Ex . Assign all of the fat rats to diet A and all of the skinny rats to diet B. Measure weight gain in all rats one week later. If we see a difference between rats on diet A and rats on diet B, what do we conclude about the two diets?

33 Better Ex . We wish to compare surgery for coronary artery disease to a drug used to treat coronary artery disease. We know that such major heart surgery is complex and invasive. Some people die during surgery. We may assign the patients with less severe coronary artery disease (on purpose or not) to the surgery group.

If we see a difference in patient survival between the two groups, is it due to surgery vs. drugs, or to less severe disease vs. more severe disease? We can‘t tell. Such a study would be inconclusive, and a waste of money, time, and patients.

34 Another good Ex ., from Professor Gary Oehlert in the School of

Statistics:

A researcher wanted to find out if soil levels of cadmium were higher close to an incinerator compared to far away from the incinerator.

8 sites were selected and 10 soil samples collected at each site.

The 80 samples were sent to a lab for analysis. Since cadmium levels are difficult to detect, the analysis is long and expensive. On each of 8 days, the lab analyzed 10 samples.

Can you guess which two factors were confounded by the way the lab analysis was done?

35 Randomization is critical because there is no way for a researcher to be aware of all possible confounders . Observational studies have little to no formal control for any confounders. (Notice that the cadmium example was an .)

36 Why else do we randomize?

• To form the basis for inference.

Treatment groups can be compared on the basis of randomization alone, without the assumptions seen in many statistical tests, such as normality and constant . A randomization test is a type of permutation test . It is based on writing down all possible ways in which the randomization could have occurred, and comparing those to the randomization that was actually done; see Chapter 1.8. This is rarely done in practice, because it is only easily implemented for very simple designs.

37 I. Randomization: How do we it?

Randomization usually consists of either

• choosing random subsets of experimental units (e.g. to then

assign one subset to each treatment group), or

• choosing a random order for the experimental units (e.g. to do

analysis of the outcome in random order).

38 Physical randomization methods include, for example, drawing slips of paper from a box.

Ex . In the coronary artery disease study, 50 slips of white paper and 50 slips of blue paper are put in a box and mixed. As each patient is enrolled, a slip is drawn (without looking) from the box. Blue slips are for the drug group and white slips are for the surgery group.

Ex . In the cadmium example, the numbers 1 through 80 are written on 80 slips of paper. The slips are drawn one at a time, and one is assigned to each soil sample. Since the numbers will be randomly ordered, it doesn‘t matter in what order the samples are sitting on the laboratory shelf. Analyze the samples in order of their assigned numbers.

39 Numerical randomization procedures use randomly generated numbers from a reference book or computer program.

Ex . In the coronary artery disease study, obtain 100 random numbers. Without sorting them, the first 50 will correspond to the drug group and the second 50 will correspond to the surgery group.

Now sort the numbers, keeping track of which number was assigned to which treatment group. The first patient gets the first (smallest) number and its corresponding group assignment. The second patient gets the second smallest number and its corresponding group assignment.

40 Ex . In the cadmium study, 80 random numbers are generated one at a time and assigned to a soil sample. Again, since the numbers are randomly ordered, it doesn‘t matter in what order the samples are when they are given a number. Then analyze the samples in numerical order.

Restricted randomization can be either physical or numeric but involves putting constraints on the randomization, such as in blocking designs. We will see this in Chapter 8.

Ex . In multi-center clinical trials, randomization of patients to treatments is done within each clinical center independently. (Why?)

41 Should randomization always be done?

No . Sometimes, especially in human studies, randomization is unethical or impossible.

Ex .

• Effect of seat belt use on extent of injury in car crashes

• Effect of cigarette smoking on lung cancer (tobacco companies use

this in their defense)

• Effect of age on time needed to learn a new skill

42 J. Principles of Analysis

Doing the right analysis is just as critical as planning the right design.

What do statisticians need to know before beginning an analysis?

• Design features: o What is the scientific question of interest? o How were measurements recorded? o How were treatments assigned? o What was the experimental unit? Was it also the measurement unit? o How were experimental units selected for participation or use in the study?

43 o Was the design appropriate to answer the scientific question of interest? o Was the measured outcome suitable to answer the scientific question of interest? o Was the study carried out in the way it was originally designed?

44 • Data features:

o Are there errors in the data file, e.g., invalid values, transposed numbers, missing values, incorrect scale, etc.? o What do the data look like, e.g., means, , spread or , trends over time, outlying values, symmetry or skew, unimodal or multi-modal distribution, gaps or clusters, etc.? o If repeated measures were taken, does the data file indicate which measures came from which experimental unit? o Is there correlation among the predictors, which can lead to collinearity problems during the analysis?

45 K. Generalizability

The conclusions we draw from an experiment are directly applicable to the experimental units used in that experiment. If the units were randomly selected from some population of units, then the conclusions can be applied to that population. If the conclusions are applied to any other group of potential experimental units, then we are extrapolating. This may or may not be valid.

46 Ideally,

1. The researcher identifies a population of interest or target

population.

2. A sample of experimental units is randomly drawn from that

population.

47 3. The sample is described with a , e.g. estimated

ˆ treatment effect, β .

ˆ ˆ 4. is made about β , e.g. that β is an

unbiased estimate of the true treatment effect, β .

5. Conclusions are drawn about the population of interest, e.g.

treatment A produces a better outcome than placebo in this

population.

What really happens almost always?

48

Ex . A team of researchers wants to know how many elderly (age ≥ 65 years) in the US use ginseng on a regular basis. The team visits 25 assisted living facilities in the Portland, OR, metropolitan area. They take a random sample of residents older than 65 years from each facility and attempt to ask each one about their ginseng use. Target population = ? Sampled population = ? Sample = ?

49 L. Ethics We‘ve already mentioned some ethical issues involved in randomization. In addition, the US government heavily regulates research done on any human or animal subjects. There are stringent rules governing these ethical principles. See http://www.research.umn.edu and click on Policies, Regulations, and

Compliance. These apply to both experimental and observational studies.

50 M. Summary

Experimental design involves choosing treatments, choosing experimental units, and assigning treatments to units.

A good design:

• should be unbiased and avoid systematic errors

• should be precise and minimize random errors

• should allow for estimation of errors from different sources

• should have broad , i.e. be generalizable to the target

population.

51 A good analysis should:

• involve data cleaning

• involve data exploration

• use an appropriate statistical method

• verify that the method‘s assumptions are met

• describe the results in appropriate language.

52