Introduction to Biostatistics for Clinical Researchers
University of Kansas Department of Biostatistics & University of Kansas Medical Center Department of Internal Medicine Materials
PowerPoint files can be downloaded from the Department of Biostatistics website at http://biostatistics.kumc.edu
A link to the recorded lectures will be posted in the same location Topics
Comparing Two (or more) Independent Proportions
Measures of Association for Dichotomous Data
Study Design Considerations Recall
Number of Parameter Null Hypothesis Test Required Assumptions to
Samples (HO) statistic use Test Statistic
1Mean μ = μO z Normality, known variance
1Mean μ = μO t Normality, unknown variance
1 2 Difference in means μ1 − μ2 = δ z Normality, known variances
1 2 Difference in means μ1 − μ2 = δ t Normality, unknown but equal variances
1 2 Difference in means μ1 − μ2 = δ t Normality, unknown variances
1 (Paired) Mean difference μD = 0 t Normality, unknown variance
K > 2 Equality of Means μ1 = μ2 = . . . = μk F Normality, equal variances
2 1 Proportion p = pO χ , z Sample size considerations
1 2 2 Difference in proportions p1 − p2 = δ χ , z, Sample size considerations Fisher’s Exact
2 K > 2 Equality of Proportions p1 = p2 = . . . = pk χ Sample size considerations
1: Usually δ = 0 Comparing Proportions Between Two Independent Populations In This Section
CIs for difference in proportions between two independent populations
Large sample methods for comparing proportions Normal approximation method Chi-square test
Fisher’s Exact Test
Measures of association Comparing Two Proportions
Pediatric AIDS Clinical Trial Group (ACTG) Protocol 076 Study Group1
Study Design “We conducted a randomized, double-blinded, placebo- controlled trial of the efficacy and safety of zidovudine (AZT) in reducing the risk of maternal-infant HIV transmission” 363 HIV infected pregnant women were randomized to AZT or placebo
1Conner, E., et al. (1994). Reduction of maternal-infant transmission of human immunodeficiency virus type 1 with zidovudine treatment, NEJM, 331:18 Comparing Two Proportions
Results Of the 180 women randomized to AZT, 13 gave birth to children who tested positive for HIV within 18 months of birth Of the 183 women randomized to placebo, 40 gave birth to children who tested positive for HIV within 18 months of birth Notes
Random assignment of treatment Helps insure two groups are comparable Patient and physician could not request a particular treatment
Double-blind Patient and physician did not know treatment assignment Observed HIV Transmission Rates
AZT
13 pˆ ==0.07() 7% AZT 180
PLA
40 pˆ ==0.22() 22% PLA 183 Notes
Is the difference significant or can it be explained by chance?
CI on the difference in proportions? P-value? Sampling Distribution of Difference in Sample Proportions Since we have large samples, we can be assured that the sampling distributions of the sample proportions in both groups are approximately normal (CLT)
It turns out the difference of quantities which are approximately normally distributed are also normally distributed Sampling Distribution of Difference in Sample Proportions The sampling distribution of the difference of two sample proportions (based on large samples) approximates a normal distribution This distribution is centered at the true, unknown population
difference p1−p2 AZT Group Placebo Group AZT−PLA 95% CIs for Difference in Proportions
General formula: Best estimate ± multiplier*SE(best estimate)
The best estimate of a population difference in sample proportions is: ppˆˆ12−
Here, pˆ 1may represent the sample proportion of infants HIV positive for 180 infants in the AZT group and pˆ 2 may represent the same for the 183 infants in the placebo group 95% CI: AZT Study
pˆˆ12−± p multiplierSE( pˆˆ12− p )
Statisticians have developed formulas for the standard error of the difference
These formulas depend on both sample sizes and sample proportions
pppp1122()11−−() SE() pˆˆ12−= p + nn12
∧ ppppˆˆˆˆ1122()11−−() SE() pˆˆ12−= p + nn12 HIV/AZT Study
Recall the data:
Group AZT PLA N 180 183 Sample 0.07 0.22 proportion
∧ 0.07( 0.93) 0.22( 0.78) SE() pˆˆ−= p + =0.36 12 180 183 95% CI: AZT Study
The 95% CI for the true difference in proportions between the AZT group and PLA groups is:
∧
−±0.15 1.96SE() pˆˆ12 − p −±0.15 1.96() 0.036 ()−0.22,0.08 Summary
Results The proportion of infants who tested positive for HIV within 18 months of birth was seven percent (95% CI 4 -12%) in the AZT group and twenty-two percent in the placebo group (95% CI 16 - 28%) The study results estimate the absolute decrease in the proportion of HIV positive infants born to HIV positive mothers associated with AZT to be as low as 8% and as high as 22% Two-sample z-test: Getting a p-value
Are the proportions of infants contracting HIV within 18 months of birth equivalent at the population level for those whose mothers are treated with AZT versus untreated?
HO: p1 = p2
HA: p1 ≠ p2
In other words, is the expected difference in proportions zero?
HO: p1 − p2 = 0
HA: p1 − p2 ≠ 0 Hypothesis Test to Compare Two Independent Proportions Recipe:
Assume HO is true
Measure the distance of our sample result from pO (here, it’s 0) Compare the distance (test statistic) to the appropriate distribution to get the p-value
observed difference− null difference z = SE ()observed difference ppˆˆ− = 12 ppppˆˆˆˆ()11−−() 1122+ nn12 HIV/AZT Study
Recall,
ppˆˆ12−=−0.15 ∧
SE() pˆˆ12−= p 0.036
So in this study,
−0.15 z ==−4.2 0.36
This result was 4.2 SE below the null mean of 0 P-values
Is a result 4.2 standard errors below 0 unusual? It depends on the distribution we’re dealing with
The p-value is the probability of getting a test statistic as or more extreme than what we observed (-4.2) by chance
The p-value comes from the sampling distribution of the difference in two sample proportions
What is the sampling distribution of the difference in sample proportions? If both groups are large, it is approximately normal It is centered at the true difference Under the null, this true difference is 0 HIV/AZT Study
To compute a p-value, we need to compute the probability of being 4.2 or more SEs away from 0 on a standard normal curve HIV/AZT Study
If we were to look this up in a normal table, we would find a very small p (p < 0.001)
This method is also essentially equivalent to the chi-square (χ2) method Displaying 2X2 Data
Data of this sort can be displayed using a contingency table The Chi-Square Test
Testing equality of two population proportions using data from two samples
HO: p1 = p2
HA: p1 ≠ p2
In other words, is the expected difference in proportions zero?
HO: p1 − p2 = 0
HA: p1 − p2 ≠ 0
In the context of a 2X2 table, this is testing whether there is a relationship between the row variable (HIV status) and the column variable (treatment type)--the test of independence Chi-Square Test
Pearson’s Chi-Square Test (χ2) can easily be done by hand
Works well for “big” sample sizes--it is an approximate method
Gives essentially the same p as the z test for comparing two proportions
Unlike the z-test, it can be extended to compare proportions between more than two independent groups in one test The Chi-Square Method
Looks at discrepancies between observed and expected cell counts (under the null hypothesis) in a 2X2 table O = observed E = expected = (row total*column total)/grand total
“Expected” refers to the values for the cell counts that would be expected if the null hypothesis is true (the expected cell proportions if the underlying population proportions are equal) The Chi-Square Method
Recipe . . . Start by assuming the null hypothesis is true Measure the distance of the sample result from the null value Compare the test statistic (distance) to the appropriate distribution to get the p-value
2 (OE− ) χ=2 ∑ E
The sampling distribution of this statistic when the null is true is a chi-square distribution with one degree of freedom Chi-Square (1) Contingency Table
The observed value for cell one is 13
We have to calculate the expected count:
RC 53( 180) E == =26.3 N 363 Expected Values
We could do the same for the other three cells:
Now we must compute the ‘distance’ (test statistic), χ2
2 (OE− ) χ=2 ∑ E Expected Values
2 (OE− ) χ=2 ∑ E 22 2 2 ()13−− 26.3() 40 26.7() 167 − 153.7() 143 − 156.3 =++ + 26.3 26.7 153.7 156.3 = 15.6 Sampling Distribution
P = 0.0001 Extending Chi-Square Test
The chi-square test can be extended to test for differences in proportions across more than two independent populations Analogous to ANOVA with binary outcomes Recall: HIV/AZT Example Recall: HIV/AZT Example
Testing equality of two population proportions using data from two samples
HO: p1 = p2
HA: p1 ≠ p2
In other words, is the expected difference in proportions zero?
HO: p1 − p2 = 0
HA: p1 − p2 ≠ 0
In the context of a 2X2 table, this is testing whether there is a relationship between the row variable (HIV status) and the column variable (treatment type)--the test of independence Statistical Methods
Pearson’s Chi-Square Test or the Normal Approximation (z) Method Both are based on the central limit theorem kicking in Both are approximate methods, but are excellent approximations if the sample sizes are large These do not perform as well in smaller samples Statistical Methods
Fisher’s Exact Test Calculations are difficult Always appropriate, regardless of sample size Usually involves computer effort Exact p-values are given (no approximations)
Rationale Suppose the null is true--that there is no association between AZT and transmission Imagine putting 53 red balls (the infected) and 310 green balls (the non-infected) in a jar Randomly select 180 of them (these are the AZT group, the rest are the placebo group) Fisher’s Exact Test
The one-sided p-value from this test is the probability that you get 13 (what we actually observed in the AZT group) or fewer red balls among the 180 For this example, a computer will compute the exact p--0.0001 Measures of Association for Binary Data Risk Difference
Risk difference is the difference in proportions, p1 − p2
Ex: the difference in risk of HIV for children born to HIV+ mothers taking AZT relative to those taking placebo is
ppˆˆ12− =−=−0.07 0.22 0.15
If AZT was given to 1000 infected pregnant women, it would reduce the number of transmissions by 150
95% CI: Study results suggest that the reduction in HIV positive births from 1,000 HIV positive pregnant women treated with AZT could range from 75 to 220 fewer than the number occurring if the 1,000 women were not treated Relative Risk
Relative Risk (risk ratio)--ratio of proportions p1/p2
Ex: The risk of HIV with AZT vs. PLA
pˆ 0.07 1 ==0.32 pˆ2 0.22 The risk of transmission with AZT is about 1/3 the risk with placebo
An HIV+ woman can reduce her personal risk of transmitting the virus by nearly 70% if she takes AZT
95% CI: Study results suggest that this reduction in risk could be as small as 40% and as large as 82% The Risk Difference vs. Relative Risk
The risk difference (attributable risk) provides a measure of the public health impact of an exposure (assuming causality)
The relative risk provides a measure of the magnitude of the disease-exposure association for an individual
Each provides a different piece of information about the “story”
AZT example—in this study 22% of the untreated mothers gave birth to children with HIV Relative risk: .32 Risk difference: -15% The Risk Difference vs. Relative Risk
Suppose that only 2% of the children born to untreated HIV positive women became HIV positive
Suppose the percentage in AZT treated women is .6% Relative risk: .32 Risk difference: -1.4 % The Risk Difference vs. Relative Risk
Suppose that 90% of the children born to untreated HIV positive women became HIV positive
Suppose the percentage in AZT treated women is 75% Relative risk: .83 Risk difference: -15 % What Is an Odds?
Like the relative risk, the odds ratio provides a measure of association in a ratio (as opposed to difference)
The odds ratio is a function of risk (prevalence)
Odds is the ratio of the risk of having an outcome to the risk of not having an outcome If p represents the risk of an outcome, then the odds are given by:
p odds = 1− p Example
In the AZT example, the estimated risk of giving birth to an HIV infected child among mothers treated with AZT was 0.07
The corresponding odds estimate is
∧ pˆ 0.07 odds ===1 0.08 10.93− pˆ1 Example
In the AZT example, the estimated risk of giving birth to an HIV infected child among mothers not treated was 0.22
The corresponding odds estimate is
∧ pˆ 0.22 odds ===2 0.28 10.78− pˆ2 AZT/Mother-Infant Transmission Example
The estimated odds ratio of an HIV birth with AZT relative to placebo
pˆ1 ∧ 1− pˆ 0.08 OR ===1 0.28 pˆ2 0.28
1− pˆ2
The odds of HIV transmission with AZT is .28 (about 1/3) the odds of transmission with placebo Odds Ratio
Interpretation AZT is associated with an estimated 72% (estimated OR = .28) reduction in odds of giving birth to an HIV infected child among HIV infected pregnant women Study results suggest that this reduction in odds could be as small as 46% and as large as 86% (95% CI on odds ratio, .14 – .54) Odds Ratio
What about a p-value?
What value of odds ratio indicates no difference in risk?
If p1 = p2 then
p1 1− p OR = 1 = 1 p2
1− p2
Thus, we need to test
HO: OR = 1 versus HA: OR ≠ 1 We can use the same test! (Chi-square) Hypothesis of Equal Proportions
Equivalent hypotheses:
p1 1− p1 HpOO::0:112=−== p Hpp 12 H O p2 1− p2
Always will estimate the same direction of association
∧∧ OR< 11⇔< RR ∧∧ OR>⇔11 RR > ∧∧ OR= 11⇔= RR How does OR compare to RR?
If CI for OR does not include 1, CI for RR will not include 1
If CI for OR includes 1, CI for RR will include 1
The lower the risk in both groups being compared, the more similar the two measures will be in magnitude The Odds Ratio versus Relative Risk
AZT example--in this study 7% of AZT treated mothers and 22% of the untreated mothers gave birth to children with HIV RR = 0.32 OR = 0.28
Suppose only 2% of the children born to untreated HIV positive women became HIV positive and that the percentage in AZT treated women is 0.6% RR = 0.32 OR = 0.30
Suppose 90% of the children born to untreated women became HIV positive and that the percentage in AZT treated women is 0.6% RR = 0.83 OR = 0.33 Why Bother with OR?
It is less intuitively interpretable than relative risk
However, with certain types of non-randomized study designs we can not get a valid estimate of RR but can still get a valid estimate of OR Survival Analysis Survival Analysis
Survival Analysis is the class of statistical methods for studying the occurrence and timing of events
The event (to name a few) could be tumor-free time development of a disease response to treatment relapse local control death
Survival analysis methods are most often applied to the study of deaths Basic Concepts
Survival Time: the time from a well-defined point in time (time origin) to the occurrence of a given event
Survival Data: survival time response to a given treatment patient characteristics related to response, survival, and the development of a disease Basic Concepts – Basic Survival Data Structure Basic Concepts
Survival data has two features that are difficult to handle with “conventional” statistical methods: censoring time-dependent covariates
Many consider survival data analysis as the application of conventional statistical methods when the data is censored
Censoring comes in many forms and occurs for many different reasons
The most common and basic types of censored data are Types I-III right-censored data Basic Concepts
Type I Censoring (also called singly-censored): Subjects are observed for a fixed period of time (under control of the investigator); at the end of the fixed period all remaining subjects are sacrificed/censored [e.g., at the end of the 3-year experiment, all subjects who have not died are censored at time 3 years] If there are no accidental losses, all censored observations equal the length of the study period
A x B x C x x D
0 1 2 3 Basic Concepts
Type II Censoring (also called singly-censored): Subjects are observed until a fixed pre-specified number of events have occurred; after that the remaining subjects are sacrificed/censored [e.g., experiment with 100 lab rats will stop when 50 of them have died] If there are no accidental losses, the censored observations equal the largest uncensored observation Basic Concepts
Type III Censoring occurs when: subjects are terminated for reasons that are not under the control of the investigator (e.g., withdrawals, loss to follow-up) there is a single termination time, but entry times vary randomly across subjects (e.g., patients receive heart surgery at various time points, typically uncontrolled by the investigator, but the study has to be terminated on a single date; all patients still alive on that date will be censored, but their survival times from surgery will vary)
A x o B x C o D 0 5 10 Basic Concepts
In most clinical studies the length of study period is fixed and the patients enter the study at different times “Lost-to-follow-up” patients’ survival times are measured from the study entry (well-defined point in time) until last contact [these will be censored observations] Patients still alive at the termination date will have survival times equal to the time from the study entry until study termination [these will be censored observations] When there are no censored survival times, the set is said to be complete Functions of Survival Time
Let T = survival time
The distribution of T can be described by the following functions: Survival Function: the probability that an individual survives longer than t: S(t) = P(an individual survives longer than t) = P(T > t)
S(t) is a non-increasing (and usually decreasing) function of time t with the following properties: S(t) = 1 for t = 0 and S(t) = 0 for t = ∞
S(t) is also known as the cumulative survival rate Functions of Survival Time
If there are no censored observations, the survival function is estimated as the proportion of patients surviving longer than t:
# of patients surviving longer than t Stˆ()= total # of patients Functions of Survival Time
Probability Density Function: The survival time T has a probability density function defined as the limit of the probability that an individual fails in the short interval (t, t + Δt) per unit width Δt:
P (an individual dying in the interval (tt ,+ Δ t )) ft( )= lim Δ→t 0 Δt
The graph of f(t) is called the density curve Functions of Survival Time
f(t) has the following two properties:
f(t) is a non-negative function: f(t) ≥ 0 for all t ≥ 0 = 0 for all t < 0
The area under the curve equals one Functions of Survival Time
If there are no censored observations, the probability density function (p.d.f.), f(t), is estimated as the proportion of patients dying in an interval per unit width:
# of patients dying in the interval beginning at time t ftˆ()= ()total # of patients() interval width
The density function is also known as the unconditional failure rate Functions of Survival Time
Hazard Function: The hazard function h(t) of survival time T gives the conditional failure rate
It is defined as the probability of failure during a very small time interval, assuming the individual has survived to the beginning of the interval:
P {an individual of age tt fails in the time interval ( ,t+Δt )} ht( )= lim Δ→t 0 Δt Functions of Survival Time
The hazard function can also be expressed in terms of the cumulative distribution function F(t) and the p.d.f. f(t):
ft() ht()= 1()− Ft This is also known as the instantaneous failure rate, force of mortality, conditional mortality rate, or age-specific failure rate
The hazard at any t corresponds to the risk of event occurrence at time t [e.g., a patient’s hazard for contracting influenza is 0.015 [1.2] with time measured in months, i.e., this patient would expect to contract influenza 0.015 [1.2] times over the course of a month assuming the hazard stays constant] Functions of Survival Time
If there are no censored observations, the hazard function is estimated as the proportion of patients dying in an interval per unit time, given that they have survived to the beginning of the interval:
# of patients dying in the interval beginning at time t htˆ()= ()# of patients surviving at t ( interval width) # of patients dying per unit time in the interval = # of patients surviving at t Functions of Survival Time
Actuaries usually use the average hazard rate of the interval in which the number of patients dying per unit time in the interval is divided by the average number of survivors at the midpoint of the interval:
# of patients dying per unit time in the interval htˆ()= 1 ()# of patients surviving at t − 2 () # deaths in the interval Functions of Survival Time
The cumulative hazard function is defined as
t Ht()==− hxdx ( ) log St () ∫0 e Thus, at t = 0 S(t) = 1, H(t) = 0 at t = ∞ S(t) = 0, H(t) = ∞ Relationships among the Survival Functions
ft() ht()= St() d ft()=−=−⎡⎤ 1 St () S '() t dt ⎣⎦ St'( ) d ht()=− =− logSt () St() dt e t −=hxdx() log() St ∫0 e
Ht()=− loge St () t St()=− exp⎡⎤ Ht () =− exp⎡ hxdx ( ) ⎤ ⎣⎦ ⎣⎢ ∫0 ⎦⎥ ft()=− ht ()exp⎣⎦⎡⎤ Ht () Nonparametric Methods for Estimation of the Survival Functions Product-Limit Estimates (Kaplan-Meier) [most widely used in biological and medical applications]
Life Table Analysis (actuarial method) [for large number of observations or if there are many unique event times]
Relative, Five-Year, and Corrected Survival Rates Nonparametric Methods for Comparing Survival Distributions Did the treatment make the difference in the survival experience between “treatment A” and “treatment B” patients?
Gehan’s Generalized Wilcoxon Test [better for detecting short- term differences in survival] Cox-Mantel Test Log-rank Test [better for detecting long-term differences in survival] Peto and Peto’s Generalized Wilcoxon Test Cox’s F-Test Mantel and Haenszel Test [allows stratification based on prognostic factors] Cox’s Proportional Hazard Regression [adjustment for multiple prognostic factors] Survival Examples
Example 1: Pilot study was conducted at the VA hospital to compare Accelerated Fractionation Radiation Therapy versus Standard Fractionation Radiation Therapy for patients with advanced unresectable squamous cell carcinoma of the head and neck.
Is there any difference in survival between the patients treated with Accelerated FRT and the patients treated with Standard FRT? Example 1: Summary of Patient Characteristics
AFRT SFRT Gender Male 28 (97%) 16 (100%) Female 1 (3%) 0 Age Median 61 65 Range 30-71 43-78 Primary Site Larynx 3 (10%) 4 (25%) Oral Cavity 6 (21%) 1 (6%) Pharynx 20 (69%) 10 (63%) Salivary Glands 0 1 (6%) Stage III 4 (14%) 8 (50%) IV 25 (86%) 8 (50%) Tumor Stage T2 3 (10%) 2 (12%) T3 8 (28%) 7 (44%) T4 18 (62%) 7 (44%)
Example 1: KM Curves by Treatment
Overall Survival by Treatment
1.00
AFRT SFRT
0.75
censored 0.50 observation Survival Probability 0.25
0.00 0 1224364860728496108120 Survival Time (months) Median Survival Time: AFRT: 18.38 months (2 censored observations) SFRT: 13.19 months (5 censored observations) Example 1: KM Curves by Treatment
Overall Survival by Treatment
1.00
AFRT SFRT
0.75
0.50 Survival Probability 0.25
0.00 0 1224364860728496108120 Survival Time (months)
Log-Rank test p-value= 0.5421 Example 1: Further Analysis
Let us consider the analysis of the survival of these patients by Stage as well as by Treatment Example 1: KM Curves by Stage and Treatment
Overall Survival by Treatment and Stage
1.00
AFRT/Stage 3 AFRT/Stage 4 SFRT/Stage 3 0.75 SFRT/Stage 4
0.50 Survival Probability 0.25
0.00 0 1224364860728496108120 Median Survival Time: Survival Time (Months) AFRT Stage 3: 77.98 mo. AFRT Stage 4: 16.21 mo. SFRT Stage 3: 19.34 mo. SFRT Stage 4: 8.82 mo.
Log-Rank test p-value = 0.0792 Example 2: Survival of Leukemia Patients Example 2: SAS Output Study Design Clofibrate Trial
A study was performed to look at the effect of a drug, Clofibrate, on mortality rates for individuals with heart disease Individuals were followed for five years after administration of the drug Results: Clofibrate Group
Coronary heart disease Results from group randomized to take Clofibrate Results: Clofibrate Group
Coronary heart disease Results from group randomized to take Clofibrate
Question: Do these results strongly demonstrate the efficacy of Clofibrate in reducing the mortality in subjects with CHD? Results: Placebo Group
Coronary heart disease Results from group randomized to take placebo The Dangers of Self-Selection
Overall mortality in each of the groups in this randomized trial The Dangers of Self-Selection
Randomized trial No significant difference (p > 0.2) between the treatment and placebo groups
No difference between TX groups The compliers and non-compliers were similar with respect to other variables (age, etc.) There were no apparent harsh side effects of Clofibrate relative to placebo that may have resulted in differential compliance between the Clofibrate and placebo groups
Source: Coronary drug research group (1980). NEJM, 1038-1041 A Randomized Control Group
Important for accounting for many kinds of biases
Randomization, done correctly on a large number of subjects nearly ensures that the only systematic difference in the groups being compared is the exposure(s) of interest Note
A very famous randomized trial (once again) 200,745 vaccinated for polio ~ 400,000 school children randomized 201,229 given a placebo 1954 Salk Polio Vaccine Trial
At the end of the follow-up period there were 82 cases in the vaccine group and 162 in the placebo group
Subsequently analyses report slightly different numbers because some false positives were discovered in each of the two groups 1954 Salk Polio Vaccine Trial
Result Results in 2X2 Table Format
How to test for an association between vaccine polio?
HO: pvaccine = pplacebo
HA: pvaccine ≠ pplacebo
You can use either Fisher’s Exact test or a Chi-square test
With Fisher’s Exact, p < 0.001 Randomized Study Design
Prospective cohort studies Choose a fixed number with and without exposure Follow subjects for a set period of time and determine who has disease/outcome of interest
Measures of association Difference in proportions Relative risk (ratio of proportions) Odds ratio Benefits of Randomization
Randomization helps protect against self-selection biases Examples: Males are more likely to volunteer for placebo than females Smokers are less likely to be in the exposed group Healthier persons sign up for the intervention
The goal of randomization is to eliminate any systematic differences in characteristics of subjects in each of the exposure groups under study, save for the exposure itself Randomized Study Design
Many epidemiological studies are concerned with estimating an association between two dichotomous (binary) variables Example: exposure-disease association
Randomized study design with control group is a type of prospective cohort study Choose a fixed number with and without exposure Follow subjects for a set time period and determine who has the disease/outcome of interest
Measures of association Difference in proportions Relative risk (ratio of proportions) Odds ratio Methods of Randomization Patient Registration
For each patient considered suitable for inclusion in a clinical trial, the following formal sequence of events should take place. Patient Registration
Patient requires treatment Patient eligible for inclusion Clinician willing to accept randomization Patient consent is obtained Patient formally entered on trial Treatment assignment obtained from randomization list On-study forms are completed Treatment commences Patient Recruitment
The identification of appropriate patients is important to ensure that the sample is representative of the disease under investigation. Trial’s conclusions can be readily applied All relevant patients should be considered Clinician should avoid being unduly selective in choice of patients Checking Eligibility
Eligibility should be checked right away by going through the list of eligibility criteria in the protocol. If any ineligible patients accidentally make it through the screening process, they may be excluded from the analysis. Agreement to Randomize
Anything other than randomization is unacceptable if randomization was agreed upon. If a clinician is unwilling to randomize, he or she should not participate in the trial at all. Patient Consent
Written informed consent is an ethical and legal requirement that must be obtained before randomization. Formal Entry on Trial
Registering a patient for entry onto a trial before randomization ensures that all randomized patients are followed for details of treatment and evaluation. Helps to guard against investigators not giving the randomized treatment Value in keeping a separate log of eligible patients not entered into the trial, which provides insight into the representativeness of trial patients Random Treatment Assignment
A randomization list of consecutive random treatment assignment should have been prepared in advance. The clinician should not know the order of this list, nor should they be able to predict the next treatment. Prepared by an independent person, statistician, etc. Random Treatment Assignment
Methods for Revealing Treatment: Sealed envelopes opened after formal entry onto trial. List given directly to pharmacist for double-blind evaluations. For a multi-center trial with a central registration office, the assignment can be read off the randomization list to the investigator over the phone. For a trial in a single institution, it is best to have an independent person responsible for patient registration and randomization who is not concerned with treating any of the patients. Documentation
Confirmation of registration Name, date of birth, trial number, assigned treatment
On-study Form Previous therapy, personal details, clinical condition, tests Efficiency and Reliability
(1) through (7) should be be completed before treatment commences. The entire process of registration and randomization should be prompt and efficient so there is no delay in treatment. Avoid unnecessary drop-outs Preparing the Randomization List
Simple randomization Replacement randomization Random permuted blocks Biased coin method Simple Randomization
Two treatments, A and B Table of random numbers Start with top row and work down Assign A for digits 0−4 Assign B for digits 5−9 AB AB BA AB AA BB AB BA BB BB BA BA AB BB BA Simple Randomization
Three treatments, A B C Assign A for 1−3 Assign B for 4−6 Assign C for 7−9 Ignore 0 - B AC CB AC BA BC AC BA BB CB CA C- BC CC CB Simple Randomization
Four treatments, A B C D Assign A for 1,2 Assign B for 3,4 Assign C for 5,6 Assign D for 7,8 Ignore 0,9 - C AD DB BD BA CD BD CA CC - C DA D- BD DD DB Simple Randomization
Each treatment is completely unpredictable. Probability theory guarantees that in the long run the numbers of patients on each treatment will not be radically different. It is often desirable to restrict randomization to ensure similar treatment numbers throughout a trial. Random Permuted Blocks
To ensure exactly equal treatment numbers at equally spaced points in the sequence of patient assignments. For T treatments, block into k patients and produce a different random ordering of k assignments to each treatment. Random Permuted Blocks
For two treatments A and B Blocks of two patients Assign AB for 0−4 Assign BA for 5−9 ABBA ABBA BAAB ABBA ABAB BABA ABBA BAAB …. Random Permuted Blocks
Three treatments, A B and C Blocks of three patients ABC for 1 ACB for 2 BAC for 3 BCA for 4 CAB for 5 CBA for 6 Ignore 7, 8, 9, and 0 - CAB ACB - - BCA BAC ABC CBA - BAC - CAB ABC CAB CBA . . . Random Permuted Blocks
Two treatments A and B Blocks of four patients AABB for 1 ABAB for 2 ABBA for 3 BBAA for 4 BABA for 5 BAAB for 6 Ignore 0 and 7−9 - BABA ABAB - - BBAA ABBA - BBAA AABB BAAB - ABBA - BABA …. Random Permuted Blocks
In general, a trial without stratification should have a reasonably large block size so as to reduce predictability, but if interim analysis is intended, not so large that serious mid-block inequality might occur. The smaller choice of block size the greater the risk of predictable randomization. For stratified assignment, small block sizes are recommended.
Random Permuted Blocks
A trial for 100 patients could have a block size of 20: Use 0−9 for A and 10−19 for B Block 1: BBBAAAABAABABBABBAAB Block 2: BBAABAABBBABAAABBABA For three treatments, blocks of size 15: Use 1−5 for A, 6−10 for B, 11−15 for C Block 1: CCABBCBAACABBAC Block 2: CCABBCACABABBCA Biased Coin Method
At each point in the trial one observes which treatment has the least patients so far: that treatment is then assigned with probability p > ½ to the next patient. If the two treatments have equal numbers then simple randomization is used for the next patient. Biased Coin Method
One has to choose an appropriate value for p using probability theory. It turns out that for p = 3/4, 2/3, 3/5, or 5/9, a difference in treatment numbers of 4, 6, 10, 16, or more has less than a 1 in 20 chance of occurring at any particular point in the trial. Biased Coin Method
p = 3/4 maintains very strict control on treatment numbers but is somewhat predictable once any inequality exists. p = 2/3 may be an appropriate choice for a relatively small trial. p = 3/5 is good for larger trials (100 patients) Biased Coin Method
Using p = 3/5 Assign the treatment with least patients for 0−5 Assign the treatment with most patients for 6−9 When treatment numbers are equal, revert to 0−4 for A and 5− 9 for B (equal randomization)
05278437416838515696 A B A AABBABBB BABAAB BBB Stratified Randomization
Stratified randomization assists us in ensuring that treatment groups are similar as regards to certain relevant patient characteristics. Example: breast cancer, pre- and post-menopausal The larger the trial, the less likely it is that there will be an imbalance. Stratified randomization is an ‘insurance policy.’ Reasons for Not Using Stratification
1. If the trial is very large (200+) and interim analysis of early results is either not feasible or of minor interest, stratification has little point. 2. If the organizational resources for supervising randomization are limited then the increased complexity of stratification may carry a certain risk of errors. Reasons for Not Using Stratification
3. If there is uncertainty about which patient characteristics might influence response, then one has inadequate knowledge on which to carry out stratification. Stratification
Decide which patient factors by which to stratify. Outcomes of previous patients in earlier trials Should be convinced of a factor’s potential for making an impact on the outcome before it is included in stratification If only vague knowledge exists, do not stratify Random Permuted Blocks within Strata
Example: breast cancer nodal status (number of positive axillary nodes, < 4, ≥ 4) age (< 50, ≥ 50) Strata
< 50 < 4 < 50 ≥ 4 ≥ 50 < 4 ≥ 50 ≥ 4 Random Permuted Blocks within Strata
A separate restricted randomization list is prepared for each of the strata using the previous methods. Generally, a small block size is used when stratifying by several factors. Randomization needs to be more tightly restricted to be effective Chances of any investigator predicting the last assignment in any block is considerably reduced Age: < 50 ≥ 50 < 50 ≥ 50 Nodes: < 4 < 4 ≥ 4 ≥ 4 B B A B A B A A B A B A A A B B
A A B A B A A B A B B B B B A A
A B B B B A A B B A B A A B A A
B A B A B B A B A B A A A A B B ...... Random Permuted Blocks within Strata
The more factors included, the greater the pre-trial documentation and amount of information required each time a patient is registered. Consider whether the extra burden is justified. Issues
For so many strata, will you achieve comparable treatment groups? Over-stratifying can be self-defeating since a large number of strata with incomplete randomized blocks can lead to substantial imbalance between treatment groups. The Minimization Method
Over-stratification: no one is especially interested in ambulatory patients aged less than 50 years with visceral metastases and disease-free interval for over two years since this is a very small subgroup of patients with advanced breast cancer. The Minimization Method
One is more interested in ensuring that the different treatment groups are similar as regards the percentage ambulatory, percentage under 50, etc. This method balances the marginal treatment totals for each level of each patient factor. The Minimization Method
Example: Advanced breast cancer trial, 80 patients already enrolled
Treatment Factor Level ABNext Patient PS Ambulatory 30 31 ← Nonambulatory 10 9 Age < 50 y 18 17 ← ≥ 50 y 22 23 DF Int < 2 y 31 32 ≥ 2 y 9 8 ← Dominant Visceral 19 21 ← metastatic Osseous 8 7 lesion Soft tissue 13 12 The Minimization Method
For each treatment, add up the number of patients in the corresponding rows of the table: A = 30 + 18 + 9 + 19 = 76 B = 31 + 17 + 8 + 21 = 77 Give the treatment with the smallest sum of marginal totals to the next patient (A) If equal, use simple randomization Keep a continual up-to-date record of patients Balancing for Institution
In multi-center trials, we must consider whether the institution entering a patient should also be a factor worth including in stratification. Different institutions can show very different response rates. Patient selection Experimental environment Balancing for Institution
If no other factors are considered, simply provide a separate randomization list for each institution. If the minimization method is used, simply include ‘institution’ as another patient factor. Stratified Randomization
Although stratification is theoretically efficient, the practical circumstances must dictate whether its use is desirable. Unequal Randomization
Equal-sized treatment groups provide the most efficient means of treatment comparison. However, interest is usually in gaining greater experience into a new treatment’s profile, or there exists a large amount of enthusiasm for the new therapy. Consider randomizing more often to the new therapy. Unequal Randomization
Consider: 5% level of significance (type I error) is used to determine required sample size for 95% power What happens to the power if we decide to put a greater proportion on the new treatment? Unequal Randomization Unequal Randomization
If the overall size of the trial is kept constant, the power decreases relatively slowly as one begins to move away from equal sized groups. The power decreases from 95% to 92.5% if we place 2/3 patients on new treatment. Power is down to 82% for 4/5 patients on new treatment. Unequal Randomization
Randomization in a 2:1 or 3:2 ratio is a realistic proposition but more extreme imbalance may be undesirable. Considerable loss of statistical power Unequal randomization is becoming more common. Response-adaptive clinical trials Adaptive Trial Designs
Unlike a fixed design, adaptive designs allow observations made during a trial to update certain aspects of the design, such as randomization scheme sample size stopping early dropping/adding arms These updates are not “ad hoc,” but are set in advance of the trial and are based on clinical, statistical, logistical, and ethical considerations. Adaptive Trial Designs
Adaptive clinical trials provide an ethical and efficient mechanism for testing multiple treatments that provide statistically sound results for future treatment. lessen exposure of current study participants to ineffective/unsafe treatments. The Bayesian approach provides a natural framework for the continual updating required by adaptive designs. Introduction to Confounding Confounding
Consider the results from the following (fictitious) study: Investigating the association between smoking and a certain disease in male and female adults 210 smokers and 240 non-smokers were recruited Confounding
Smoking is protecting against disease?
Most of the smokers are male and non-smokers are female: Confounding
Smoking is protecting against disease?
Most of the persons with disease are female: Confounding Confounding
The original outcome is disease
The original exposure of interest is smoking
In this example, gender is related to both the outcome and exposure This relationship is possibly impacting the overall relationship between disease and smoking
How can we look at the relationship between disease and smoking removing any possible ‘interference’ from gender? One approach—look at disease and smoking relationship separately within males and females Example
Is smoking related to disease in males? Example
Is smoking related to disease in females? Smoking, Disease, and Gender
Recap: The overall (crude/unadjusted) relationship (RR) between smoking and disease was nearly one (risk difference nearly 0)
The sex specific results showed similar positive associations between smoking and disease
Note: for the moment we aren’t considering statistical significance, just using estimates to illustrate the point Simpson’s Paradox
The nature of an association can change (and even reverse direction) or disappear when data from several groups are combined to form a single group
As association between an exposure X and a disease Y can be confounded by another lurking (hidden) variable Z Simpson’s Paradox
A confounder Z distorts the true relationship between X and Y
This can happen if Z is related to both X and Y Confounding What is the solution?
If you don’t know what the potential confounders are, there’s not much you can do after the study is over Randomization is your best protection—it eliminates the potential links between the exposure of interest and potential confounders
If you can’t randomize but know what the potential confounders are, there are statistical methods to help control or adjust for confounders—make sure you’re measuring the potential confounders as part of your study How to adjust for confounding?
Stratify Look at the tables separately Male/female, by clinic, etc. Take the weighted average of stratum specific estimates
For example, in the disease/smoking situation: To get a sex adjusted relative risk for the smoking disease relationship we could weight the sex-specific relative risks by numbers of males and females: How to adjust for confounding?
There are better ways than this to take such a weighted average (weighting by standard error, for instance) but this illustrates the concept
Confidence intervals can be computed for these adjusted measures of association
One way to assess whether sex is a confounder: compare crude RR to sex-adjusted RR—if it’s ‘different’ then sex is a confounder How to adjust for confounding?
Regression methods can be used More generalizable than weighted average approach, but the idea is similar Example: Arm Circumference and Height
An observational study to estimate the association between arm circumference and height in Nepali children 94 randomly selected subjects (ages 3 months to 6.5 years) had arm circumference, weight, and height measured This study is observational—it is not possible to randomize subjects to height groups Example: Arm Circumference and Height
The data: Arm circumference range: 11.6 – 16.5 cm Height range: 57 – 109 cm Weight range: 5 – 18 kg
To perform the analysis: Dichotomize height at median (87 cm) Dichotomize weight at median (11.4 kg) Example: Arm Circumference and Height Example: Arm Circumference and Height
Mean arm circumference by height group
Shorter subjects have arm circumference on average 0.7 cm lower than taller subjects (mean difference = -0.7 with 95% CI -1.1 to - 0.3 cm) Example: Arm Circumference and Height
However, it is very likely that arm circumference and height are both related to a child’s weight
Some of the relationships between arm circumference and height could be because of, or masked by, these ‘behind the scenes’ relationships to weight Example: Arm Circumference and Height
What about weight? Example: Arm Circumference and Height
Possible diagram of this scenario Example: Arm Circumference and Height
Recall the original finding: children below the median height had arm circumferences of 0.7 cm lower on average than children equal to or above the median height
To investigate whether this estimate is being fueled or lessened in part by weight differences in the height groups, stratify by weight group and estimate the arm circumference/height association in each weight group Example: Arm Circumference and Height
Mean arm circumference by height group Children below median weight
Shorter subjects below the median weight have average arm circumferences on average 0.02 cm larger than taller subjects below the median weight (95% CI: -0.64 [lower] to 0.68 [higher] cm) Example: Arm Circumference and Height
Mean arm circumference by height group Children equal to or above median weight
Shorter subjects at or above the median weight have average arm circumferences 0.06 cm larger than taller subjects at or above the median weight (95% CI: -0.9 [lower] to 1.0 [higher] cm) Example: Arm Circumference and Height
Recap: Ignoring weight, children below the median height had arm circumferences of 0.69 less (on average) than children at or above the median height and this difference was statistically significant When stratified by weight children below the median height had arm circumferences marginally larger (on average) than children with or above the median height in both weight groups, but these estimates were very close to 0 and not statistically significant Example: Arm Circumference and Height
It appears that the association between arm circumference and height ‘disappears’ after accounting for weight
Associations (shorter versus taller): Crude/unadjusted -0.7 cm (95% CI -1.1 to -0.3) Adjusted? One possibility: take the weighted average of weight specific AC/height associations, weighted by the inverse of SEs of weight specific associations: Example: Arm Circumference and Height
We can get 95% CIs for this adjusted Estimate: -0.3 to 0.38 cm Example: Arm Circumference and Height
One approach—take a weighted average of the average arm circumference differences between subjects below and above the median weight within weight groups (weighted by the size of each group)
However, this is a pain and if there are more potential confounders we could spend our life stratifying and computing such estimates
Better approach—multiple regression methods Example: Arm Circumference and Height
FYI A weighted overall average height adjusted difference in arm circumference between two weight groups is 0.98 cm with 95% CI 0.4 cm to 1.55 cm
When adjusted for weight, the arm circumference/height association disappears
When adjusted for height, the arm circumference/weight association is almost the same as the unadjusted arm circumference/weight association Example: Arm Circumference and Height
This is an interesting case, perhaps better illustrated by this picture: Example: Arm Circumference and Height
This is not always the case—many times when there is confounding between and outcome and two (or more) grouping variables, all of the adjusted outcome/group relationships will differ from the unadjusted associations Example: South African Study
A longitudinal study from South Africa: birth cohort, followed up five years after birth
Participation by medical aid status at birth, all baseline participants
95% CI: 0.53 to 0.92 Example: South African Study
A longitudinal study from South Africa: birth cohort, followed up five years after birth
Participation by medical aid status at birth, black participants
95% CI: 0.76 to 1.36 Example: South African Study
A longitudinal study from South Africa: birth cohort, followed up five years after birth
Participation by medical aid status at birth, white participants
95% CI: 0.25 to 4.5 Example: South African Study
Race: majority of sample is black (91%)
Race and follow-up participation: 26% of black subjects completed follow-up as compared to 9% of white subjects
Race and medical aid: 9% of black subjects had medical aid compared to 83% of white subjects Statistical Interaction/Effect Modification Effect Modification/Interaction
Let’s look at the results from a fictitious data set comparing two treatments for a fatal disease as to the impact of each on reducing deaths: there are 600 ‘younger’ patients and 600 ‘older’ patients in this random sample 1,200
Here is the data on all patients
Surgery and drug groups have identical proportions dying (50% in each group)! Effect Modification/Interaction
Here is the data on only the 600 younger patients Effect Modification/Interaction
Here is the data on only the 600 older patients Effect Modification/Interaction
A recap of the overall and age specific results Effect Modification/Interaction
Age is an effect modifier: Age modifies the association between death and treatment (statistical interaction between age and treatment)!
The association between death and treatment depends on age Surgery is better for younger patients, drug therapy is better for the older patients Effect Modification/Interaction
The association between death and one variable (treatment) depends on the level of another variable (age)
Here, it would not make sense to estimate one composite, overall measure of the association between death and treatment
Best way to look at this data is to just look at the two tables (young and old) separately and estimate two separate death/treatment associations Example: Tree Damage and Elevation
Data on elevation and percentage of dead or badly damaged trees, from 64 Appalachian sites (reported by Committee on Monitoring and Assessment of Trends in Acid Deposition, 1986)
Eight of the 64 sites are in Southern states
Elevation information--whether the site was above or below 1,100 meters
This is an observational study (why?) Example: Tree Damage and Elevation
Data for the first ten sites Example: Tree Damage and Elevation
Let’s try to assess the relationship between the percentage of damaged trees and elevation--here is a boxplot of the percentage of damaged trees by elevation Example: Tree Damage and Elevation
Mean percentage of damaged trees by elevation group Example: Tree Damage and Elevation
What about region though? Boxplot percentage of damaged trees by region Example: Tree Damage and Elevation
So sites in the South have less damage on average (i.e., not only is the percentage of damaged trees related to elevation, but it is also related to region)
If region is related to elevation, then it’s possible that part of the relationship we saw (or didn’t see) between damage and elevation is because of the region-damage-elevation relationship Example: Tree Damage and Elevation
Possible diagram of this scenario Example: Tree Damage and Elevation
Recall the original finding--sites with lower elevation had a marginally lower percentage of damaged trees on average: 0.2% less damaged than sites at higher elevation
To adjust for regional differences in the elevation groups, and the damage/region relationship, let’s stratify by region, and estimate the damage/elevation association in each region Example: Tree Damage and Elevation
Mean percentage of damage tree by elevation in southern sites
Southern sites at higher elevation have 14.5% less damaged trees on average than southern sites at lower elevation (95% CI: 48.0% lower - 19.0% higher) Example: Tree Damage and Elevation
Mean percentage of damage tree by elevation in northern sites
Northern sites at higher elevation have 23% less damaged trees on average than northern sites at lower elevation (95% CI: 2% lower - 44% higher) Example: Tree Damage and Elevation
A recap Ignoring region, sites with lower PCV and sites with lower elevation had marginally lower percentage of damaged trees on average--0.2% less damaged than sites at higher elevation
When stratified by region . . . Northern sites showed positive, statistically-significant association between damage and elevation Southern sites showed negative, non-statistically-significant association between damage and elevation Example: Tree Damage and Elevation
So, it appears as though the association between tree damage and elevation is different, both in magnitude and direction depending on region
We have a small dataset, so lack of statistical significance of negative damage/elevation associations in the south may be because of lower power Example: Tree Damage and Elevation
One approach--take a weighted average of the average damage differences between sites at low and high elevations within each region, weighted by number of observations in each region
However, does not necessarily make sense here--why combine estimates that differ in direction into one overall estimate?
Better approach--to report two mean differences in damage between low and high elevation sites (one estimate for northern sites, one estimate for southern sites)
Better approach--multiple regression methods References and Citations
Lectures modified from notes provided by John McGready and Johns Hopkins Bloomberg School of Public Health accessible from the World Wide Web: http://ocw.jhsph.edu/courses/introbiostats/schedule.cfm