II. Principles of Experimental Design
A. Experimental Units
The experimental unit is that object to which a treatment or condition is independently applied.
EXAMPLE Carcinogenic substances 20 rats are randomly assigned to each of 4 doses of a potential carcinogen: none, low, medium, and high. The rats are kept in individual cages under the same environmental conditions in the same room. Each rat has its assigned dose stirred into its daily meal for 4 weeks. The number of tumors found in each rat is recorded at the end of the 4 week period. What is the experimental unit?
13
The rats are also called the measurement units or observational units because the outcome of interest is measured on each rat.
EXAMPLE Carcinogenic substances 20 rats are randomly assigned to 4 cages (5 rats in each). Each cage is then randomly assigned to one dose of a potential carcinogen: none, low, medium, and high. The rats are kept in their assigned cages under the same environmental conditions in the same room. Rats are not fed individually; food is placed in each cage in a common dish out of which all rats eat. Each cage has its assigned dose stirred into the food for 4 weeks. The number of tumors found in each rat is recorded at the end of the 4 week period. What is the experimental unit? What is the measurement unit?
14 B. Experimental Factors
If a study is an experimental design, then there are one or more experimental factors : covariates whose levels or values are assigned by the investigator to each experimental unit. Observational studies, and some experimental designs, have classification factors : covariates whose levels or values are observed for each experimental unit by the investigator
Ex . Diabetes study
Treatment - Gender œ
Ex . Midwest Heart Study
Education - Race/ethnicity -
15 C. Experimental factors which correspond to treatments
A placebo is an inactive treatment. It is constructed to look, feel, or taste as much like the active treatment under study as possible.
Ex . Surgery for knee pain Active treatment: incision in the knee followed by a cleaning of the knee joint using sterile saline solution Placebo: incision in the knee followed by sound effects meant to mimic those made during a cleaning
Ex . HIV/AIDS medications Active treatment: 2 drugs in pill form œ ZDV and ddI Placebo: 2 drugs in pill form œ ZDV and an inactive pill that looks and tastes like ddI.
16 - A placebo is used so that any one study participant is not likely to know which treatment they are receiving. If every subject is not informed of which treatment they are receiving, the study is called single-blinded . The subject is —blind“ to treatment assignment.
- If the investigator who is measuring the outcome of interest is also —blind“ to treatment assignment for every participant, the study is called double-blinded .
- If there is a statistician who is running interim analyses (analyses carried out to compare groups before the study is completed), and that person is also —blind“ to treatment assignment for every participant, the study is called triple-blinded .
17 - A placebo is a type of control treatment, a baseline or benchmark to which the active treatment will be compared. No treatment at all (null treatment or null control ) is sometimes used in addition to or instead of a placebo.
Ex . Study of gum disease treatment Treatment: oral rinse with an active drug Active control (placebo): oral rinse with no active drug Null control: no oral rinse
18 D. Study types
A cross-sectional study collects data on all experimental units at only one point in time.
A longitudinal study or follow-up study collects data for at least two points in time.
A prospective study determines the experimental units at the current time and then collects data on them forward in time.
A retrospective study determines the experimental units at the current time and then collects data on them from past records.
19 A clinical trial collects health outcomes (clinical data) on humans and follows them forward in time for changes in that health outcome or for occurrence of new health outcomes.
When clinical trials recruit and follow participants at multiple locations, then it is called a multi-center clinical trial .
See PubH 7420 Clinical Trials
PubH 7400 (section 003) Statistics for Translational and
Clinical Research
20 E. Types of outcomes
- Many experiments measure multiple outcomes or responses. One of them is usually selected as the primary outcome , the outcome that is of most scientific interest, and information on this outcome may be used to calculate how many experimental units are needed to run the research study.
- Other outcomes, those needed to answer other questions of scientific interest, are sometimes called the secondary outcomes .
21 - Sometimes it is impractical to measure the outcome of primary interest (e.g., death) because the outcome is rare, or difficult or impossible to measure. Researchers may then use a surrogate outcome , a response that is supposed to be related to (predictive of) the outcome of interest.
Ex . HIV/AIDS studies often use:
22 F. Sources of Errors
When statisticians talk about errors , we are referring to observed variability in the recorded value of an outcome across experimental units.
One source of error is assumed to always exist: experimental error . This describes the variability in the outcome among identically and independently treated experimental units.
Experimental error can arise from:
• natural or inherent differences between experimental units • variability in the measurement process
23 • inability to exactly replicate the treatment or condition from one experimental unit to the next • a treatment that acts differently across units • extraneous factors that influence the measurements
How can we minimize these problems?
• Select experimental units which are as homogenous as possible.
When this is difficult to do, or when doing so would constrain the generalizability of the research results, consider blocking designs. We will see these in Chapter 8.
24 • Take measurements in as uniform a manner as possible. Be aware
of how the following can introduce variability that is irrelevant to
the purpose of the study:
o person-to-person differences in how measurements are taken
(people with more vs. less experience, people with better vs.
poorer training in measuring, people with more vs. less
patience, etc.)
o machine-to-machine differences in how measurements are
recorded (calibration, age or condition of the machine, etc.)
o differences in equipment (pipettes, calipers, microtitre plates,
rulers, etc.)
25 o differences in techniques (assays, dilutions, solution, survey
questions, etc.)
o within person differences (how tired was the student collecting
the measurements?)
o differences in supplies (salines, soils, chemicals, drugs, growth
media, etc.)
• Construct the treatments(s) in as uniform a manner as possible.
• Control the environment to be as uniform as possible
(temperature, humidity, light, accessibility of clinics, financial
incentives, other incentives, etc.)
26 Other variability can be due to specific sources:
• Lab studies using mice should track which animals are related to
each other, as should human studies (genetic variability).
• Crop studies should determine the physical attributes of the soil.
Is one half of the field always wetter than the other half because
the field has a downward slope? (physical variability)
• Studies that take time to complete (due to e.g. lab or recruitment
constraints) should keep track of the days/weeks/months on which
measurements are collected. (temporal, seasonal variability)
(Those recruited in summer, for example, may somehow be
different from those recruited in winter.)
27 • Studies involving repeated measurements on each experimental
unit need to keep track of which measurements came from which
unit (within unit variability).
Such sources of variability can be controlled and then measured using blocking designs as well (Chapter 8). Alternatively, statistical control of such factors may be achieved by adjusting for covariates in the statistical model (e.g., season). We will see this in Chapter 17.
28 G. Replication
Use of replication in an experiment means that we assign several experimental units to each treatment group. Why is replication important?
• It allows the researcher to demonstrate reproducibility: the
treatment shows similar results in each of several experimental
units.
• It guards against an experiment failing merely because one or a
few experimental units become unusable (e.g., mice died, plants
died, people dropped out, etc.).
29 • It allows us to measure experimental error.
• It improves the precision with which we can measure the
treatment effect.
A critical question in experimental design is, how many replications should be used? We will come back to this question in several lectures during this course.
30 Pseudo-replications are different. These are repeated measurements take from the same experimental unit. This is sometimes called sub- sampling . Such measurements are not independent of each other, and statistical analysis of such data must account for the correlation. We‘ll see this in several chapters of the book.
Also consider PubH 7430 Methods for Correlated Data.
31 H. Randomization: Why do we do it?
Randomization is a critical component in the design of experiments. It involves assigning subjects to treatment groups so that all possible assignments are, in theory, equally likely . Experimental units should have an equal chance of being assigned to any of the treatment groups.
Why do we randomize?
32 • To protect against confounding . Confounding occurs when the
effect of one factor cannot be distinguished from the effect of
another factor.
Obvious Ex . Assign all of the fat rats to diet A and all of the skinny rats to diet B. Measure weight gain in all rats one week later. If we see a difference between rats on diet A and rats on diet B, what do we conclude about the two diets?
33 Better Ex . We wish to compare surgery for coronary artery disease to a drug used to treat coronary artery disease. We know that such major heart surgery is complex and invasive. Some people die during surgery. We may assign the patients with less severe coronary artery disease (on purpose or not) to the surgery group.
If we see a difference in patient survival between the two groups, is it due to surgery vs. drugs, or to less severe disease vs. more severe disease? We can‘t tell. Such a study would be inconclusive, and a waste of money, time, and patients.
34 Another good Ex ., from Professor Gary Oehlert in the School of
Statistics:
A researcher wanted to find out if soil levels of cadmium were higher close to an incinerator compared to far away from the incinerator.
8 sites were selected and 10 soil samples collected at each site.
The 80 samples were sent to a lab for analysis. Since cadmium levels are difficult to detect, the analysis is long and expensive. On each of 8 days, the lab analyzed 10 samples.
Can you guess which two factors were confounded by the way the lab analysis was done?
35 Randomization is critical because there is no way for a researcher to be aware of all possible confounders . Observational studies have little to no formal control for any confounders. (Notice that the cadmium example was an observational study.)
36 Why else do we randomize?
• To form the basis for inference.
Treatment groups can be compared on the basis of randomization alone, without the assumptions seen in many statistical tests, such as normality and constant variance. A randomization test is a type of permutation test . It is based on writing down all possible ways in which the randomization could have occurred, and comparing those to the randomization that was actually done; see Chapter 1.8. This is rarely done in practice, because it is only easily implemented for very simple designs.
37 I. Randomization: How do we it?
Randomization usually consists of either
• choosing random subsets of experimental units (e.g. to then
assign one subset to each treatment group), or
• choosing a random order for the experimental units (e.g. to do
analysis of the outcome in random order).
38 Physical randomization methods include, for example, drawing slips of paper from a box.
Ex . In the coronary artery disease study, 50 slips of white paper and 50 slips of blue paper are put in a box and mixed. As each patient is enrolled, a slip is drawn (without looking) from the box. Blue slips are for the drug group and white slips are for the surgery group.
Ex . In the cadmium example, the numbers 1 through 80 are written on 80 slips of paper. The slips are drawn one at a time, and one is assigned to each soil sample. Since the numbers will be randomly ordered, it doesn‘t matter in what order the samples are sitting on the laboratory shelf. Analyze the samples in order of their assigned numbers.
39 Numerical randomization procedures use randomly generated numbers from a reference book or computer program.
Ex . In the coronary artery disease study, obtain 100 random numbers. Without sorting them, the first 50 will correspond to the drug group and the second 50 will correspond to the surgery group.
Now sort the numbers, keeping track of which number was assigned to which treatment group. The first patient gets the first (smallest) number and its corresponding group assignment. The second patient gets the second smallest number and its corresponding group assignment.
40 Ex . In the cadmium study, 80 random numbers are generated one at a time and assigned to a soil sample. Again, since the numbers are randomly ordered, it doesn‘t matter in what order the samples are when they are given a number. Then analyze the samples in numerical order.
Restricted randomization can be either physical or numeric but involves putting constraints on the randomization, such as in blocking designs. We will see this in Chapter 8.
Ex . In multi-center clinical trials, randomization of patients to treatments is done within each clinical center independently. (Why?)
41 Should randomization always be done?
No . Sometimes, especially in human studies, randomization is unethical or impossible.
Ex .
• Effect of seat belt use on extent of injury in car crashes
• Effect of cigarette smoking on lung cancer (tobacco companies use
this in their defense)
• Effect of age on time needed to learn a new skill
42 J. Principles of Analysis
Doing the right analysis is just as critical as planning the right design.
What do statisticians need to know before beginning an analysis?
• Design features: o What is the scientific question of interest? o How were measurements recorded? o How were treatments assigned? o What was the experimental unit? Was it also the measurement unit? o How were experimental units selected for participation or use in the study?
43 o Was the design appropriate to answer the scientific question of interest? o Was the measured outcome suitable to answer the scientific question of interest? o Was the study carried out in the way it was originally designed?
44 • Data features:
o Are there errors in the data file, e.g., invalid values, transposed numbers, missing values, incorrect scale, etc.? o What do the data look like, e.g., means, variances, spread or range, trends over time, outlying values, symmetry or skew, unimodal or multi-modal distribution, gaps or clusters, etc.? o If repeated measures were taken, does the data file indicate which measures came from which experimental unit? o Is there correlation among the predictors, which can lead to collinearity problems during the analysis?
45 K. Generalizability
The conclusions we draw from an experiment are directly applicable to the experimental units used in that experiment. If the units were randomly selected from some population of units, then the conclusions can be applied to that population. If the conclusions are applied to any other group of potential experimental units, then we are extrapolating. This may or may not be valid.
46 Ideally,
1. The researcher identifies a population of interest or target
population.
2. A sample of experimental units is randomly drawn from that
population.
47 3. The sample is described with a statistic, e.g. estimated
ˆ treatment effect, β .
ˆ ˆ 4. Statistical inference is made about β , e.g. that β is an
unbiased estimate of the true treatment effect, β .
5. Conclusions are drawn about the population of interest, e.g.
treatment A produces a better outcome than placebo in this
population.
What really happens almost always?
48
Ex . A team of researchers wants to know how many elderly (age ≥ 65 years) in the US use ginseng on a regular basis. The team visits 25 assisted living facilities in the Portland, OR, metropolitan area. They take a random sample of residents older than 65 years from each facility and attempt to ask each one about their ginseng use. Target population = ? Sampled population = ? Sample = ?
49 L. Ethics We‘ve already mentioned some ethical issues involved in randomization. In addition, the US government heavily regulates research done on any human or animal subjects. There are stringent rules governing these ethical principles. See http://www.research.umn.edu and click on Policies, Regulations, and
Compliance. These apply to both experimental and observational studies.
50 M. Summary
Experimental design involves choosing treatments, choosing experimental units, and assigning treatments to units.
A good design:
• should be unbiased and avoid systematic errors
• should be precise and minimize random errors
• should allow for estimation of errors from different sources
• should have broad validity, i.e. be generalizable to the target
population.
51 A good analysis should:
• involve data cleaning
• involve data exploration
• use an appropriate statistical method
• verify that the method‘s assumptions are met
• describe the results in appropriate language.
52