Chi-Square Test

Home , Exact test, Stratified randomization

Introduction to Biostatistics for Clinical Researchers

University of Kansas Department of Biostatistics & University of Kansas Medical Center Department of Internal Medicine Materials

PowerPoint files can be downloaded from the Department of Biostatistics website at http://biostatistics.kumc.edu

A link to the recorded lectures will be posted in the same location Topics

Comparing Two (or more) Independent Proportions

Measures of Association for Dichotomous Data

Survival Analysis

Study Design Considerations Recall

Number of Parameter Null Hypothesis Test Required Assumptions to

Samples (HO) statistic use Test Statistic

1Mean μ = μO z Normality, known variance

1Mean μ = μO t Normality, unknown variance

1 2 Difference in means μ1 − μ2 = δ z Normality, known variances

1 2 Difference in means μ1 − μ2 = δ t Normality, unknown but equal variances

1 2 Difference in means μ1 − μ2 = δ t Normality, unknown variances

1 (Paired) Mean difference μD = 0 t Normality, unknown variance

K > 2 Equality of Means μ1 = μ2 = . . . = μk F Normality, equal variances

2 1 Proportion p = pO χ , z Sample size considerations

1 2 2 Difference in proportions p1 − p2 = δ χ , z, Sample size considerations Fisher’s Exact

2 K > 2 Equality of Proportions p1 = p2 = . . . = pk χ Sample size considerations

1: Usually δ = 0 Comparing Proportions Between Two Independent Populations In This Section

CIs for difference in proportions between two independent populations

Large sample methods for comparing proportions Normal approximation method Chi-square test

Fisher’s Exact Test

Measures of association Comparing Two Proportions

Pediatric AIDS Clinical Trial Group (ACTG) Protocol 076 Study Group1

Study Design “We conducted a randomized, double-blinded, placebo- controlled trial of the efficacy and safety of zidovudine (AZT) in reducing the risk of maternal-infant HIV transmission” 363 HIV infected pregnant women were randomized to AZT or placebo

1Conner, E., et al. (1994). Reduction of maternal-infant transmission of human immunodeficiency virus type 1 with zidovudine treatment, NEJM, 331:18 Comparing Two Proportions

Results Of the 180 women randomized to AZT, 13 gave birth to children who tested positive for HIV within 18 months of birth Of the 183 women randomized to placebo, 40 gave birth to children who tested positive for HIV within 18 months of birth Notes

Random assignment of treatment Helps insure two groups are comparable Patient and physician could not request a particular treatment

Double-blind Patient and physician did not know treatment assignment Observed HIV Transmission Rates

AZT

13 pˆ ==0.07() 7% AZT 180

PLA

40 pˆ ==0.22() 22% PLA 183 Notes

Is the difference significant or can it be explained by chance?

CI on the difference in proportions? P-value? Sampling Distribution of Difference in Sample Proportions Since we have large samples, we can be assured that the sampling distributions of the sample proportions in both groups are approximately normal (CLT)

It turns out the difference of quantities which are approximately normally distributed are also normally distributed Sampling Distribution of Difference in Sample Proportions The sampling distribution of the difference of two sample proportions (based on large samples) approximates a normal distribution This distribution is centered at the true, unknown population

difference p1−p2 AZT Group Placebo Group AZT−PLA 95% CIs for Difference in Proportions

General formula: Best estimate ± multiplier*SE(best estimate)

The best estimate of a population difference in sample proportions is: ppˆˆ12−

Here, pˆ 1may represent the sample proportion of infants HIV positive for 180 infants in the AZT group and pˆ 2 may represent the same for the 183 infants in the placebo group 95% CI: AZT Study

pˆˆ12−± p multiplierSE( pˆˆ12− p )

Statisticians have developed formulas for the standard error of the difference

These formulas depend on both sample sizes and sample proportions

pppp1122()11−−() SE() pˆˆ12−= p + nn12

∧ ppppˆˆˆˆ1122()11−−() SE() pˆˆ12−= p + nn12 HIV/AZT Study

Recall the data:

Group AZT PLA N 180 183 Sample 0.07 0.22 proportion

∧ 0.07( 0.93) 0.22( 0.78) SE() pˆˆ−= p + =0.36 12 180 183 95% CI: AZT Study

The 95% CI for the true difference in proportions between the AZT group and PLA groups is:

∧

−±0.15 1.96SE() pˆˆ12 − p −±0.15 1.96() 0.036 ()−0.22,0.08 Summary

Results The proportion of infants who tested positive for HIV within 18 months of birth was seven percent (95% CI 4 -12%) in the AZT group and twenty-two percent in the placebo group (95% CI 16 - 28%) The study results estimate the absolute decrease in the proportion of HIV positive infants born to HIV positive mothers associated with AZT to be as low as 8% and as high as 22% Two-sample z-test: Getting a p-value

Are the proportions of infants contracting HIV within 18 months of birth equivalent at the population level for those whose mothers are treated with AZT versus untreated?

HO: p1 = p2

HA: p1 ≠ p2

In other words, is the expected difference in proportions zero?

HO: p1 − p2 = 0

HA: p1 − p2 ≠ 0 Hypothesis Test to Compare Two Independent Proportions Recipe:

Assume HO is true

Measure the distance of our sample result from pO (here, it’s 0) Compare the distance (test statistic) to the appropriate distribution to get the p-value

observed difference− null difference z = SE ()observed difference ppˆˆ− = 12 ppppˆˆˆˆ()11−−() 1122+ nn12 HIV/AZT Study

Recall,

ppˆˆ12−=−0.15 ∧

SE() pˆˆ12−= p 0.036

So in this study,

−0.15 z ==−4.2 0.36

This result was 4.2 SE below the null mean of 0 P-values

Is a result 4.2 standard errors below 0 unusual? It depends on the distribution we’re dealing with

The p-value is the probability of getting a test statistic as or more extreme than what we observed (-4.2) by chance

The p-value comes from the sampling distribution of the difference in two sample proportions

What is the sampling distribution of the difference in sample proportions? If both groups are large, it is approximately normal It is centered at the true difference Under the null, this true difference is 0 HIV/AZT Study

To compute a p-value, we need to compute the probability of being 4.2 or more SEs away from 0 on a standard normal curve HIV/AZT Study

If we were to look this up in a normal table, we would find a very small p (p < 0.001)

This method is also essentially equivalent to the chi-square (χ2) method Displaying 2X2 Data

Data of this sort can be displayed using a contingency table The Chi-Square Test

Testing equality of two population proportions using data from two samples

HO: p1 = p2

HA: p1 ≠ p2

In other words, is the expected difference in proportions zero?

HO: p1 − p2 = 0

HA: p1 − p2 ≠ 0

In the context of a 2X2 table, this is testing whether there is a relationship between the row variable (HIV status) and the column variable (treatment type)--the test of independence Chi-Square Test

Pearson’s Chi-Square Test (χ2) can easily be done by hand

Works well for “big” sample sizes--it is an approximate method

Gives essentially the same p as the z test for comparing two proportions

Unlike the z-test, it can be extended to compare proportions between more than two independent groups in one test The Chi-Square Method

Looks at discrepancies between observed and expected cell counts (under the null hypothesis) in a 2X2 table O = observed E = expected = (row total*column total)/grand total

“Expected” refers to the values for the cell counts that would be expected if the null hypothesis is true (the expected cell proportions if the underlying population proportions are equal) The Chi-Square Method

Recipe . . . Start by assuming the null hypothesis is true Measure the distance of the sample result from the null value Compare the test statistic (distance) to the appropriate distribution to get the p-value

2 (OE− ) χ=2 ∑ E

The sampling distribution of this statistic when the null is true is a chi-square distribution with one degree of freedom Chi-Square (1) Contingency Table

The observed value for cell one is 13

We have to calculate the expected count:

RC 53( 180) E == =26.3 N 363 Expected Values

We could do the same for the other three cells:

Now we must compute the ‘distance’ (test statistic), χ2

2 (OE− ) χ=2 ∑ E Expected Values

2 (OE− ) χ=2 ∑ E 22 2 2 ()13−− 26.3() 40 26.7() 167 − 153.7() 143 − 156.3 =++ + 26.3 26.7 153.7 156.3 = 15.6 Sampling Distribution

P = 0.0001 Extending Chi-Square Test

The chi-square test can be extended to test for differences in proportions across more than two independent populations Analogous to ANOVA with binary outcomes Recall: HIV/AZT Example Recall: HIV/AZT Example

Testing equality of two population proportions using data from two samples

HO: p1 = p2

HA: p1 ≠ p2

In other words, is the expected difference in proportions zero?

HO: p1 − p2 = 0

HA: p1 − p2 ≠ 0

In the context of a 2X2 table, this is testing whether there is a relationship between the row variable (HIV status) and the column variable (treatment type)--the test of independence Statistical Methods

Pearson’s Chi-Square Test or the Normal Approximation (z) Method Both are based on the central limit theorem kicking in Both are approximate methods, but are excellent approximations if the sample sizes are large These do not perform as well in smaller samples Statistical Methods

Fisher’s Exact Test Calculations are difficult Always appropriate, regardless of sample size Usually involves computer effort Exact p-values are given (no approximations)

Rationale Suppose the null is true--that there is no association between AZT and transmission Imagine putting 53 red balls (the infected) and 310 green balls (the non-infected) in a jar Randomly select 180 of them (these are the AZT group, the rest are the placebo group) Fisher’s Exact Test

The one-sided p-value from this test is the probability that you get 13 (what we actually observed in the AZT group) or fewer red balls among the 180 For this example, a computer will compute the exact p--0.0001 Measures of Association for Binary Data Risk Difference

Risk difference is the difference in proportions, p1 − p2

Ex: the difference in risk of HIV for children born to HIV+ mothers taking AZT relative to those taking placebo is

ppˆˆ12− =−=−0.07 0.22 0.15

If AZT was given to 1000 infected pregnant women, it would reduce the number of transmissions by 150

95% CI: Study results suggest that the reduction in HIV positive births from 1,000 HIV positive pregnant women treated with AZT could range from 75 to 220 fewer than the number occurring if the 1,000 women were not treated Relative Risk

Relative Risk (risk ratio)--ratio of proportions p1/p2

Ex: The risk of HIV with AZT vs. PLA

pˆ 0.07 1 ==0.32 pˆ2 0.22 The risk of transmission with AZT is about 1/3 the risk with placebo

An HIV+ woman can reduce her personal risk of transmitting the virus by nearly 70% if she takes AZT

95% CI: Study results suggest that this reduction in risk could be as small as 40% and as large as 82% The Risk Difference vs. Relative Risk

The risk difference (attributable risk) provides a measure of the public health impact of an exposure (assuming causality)

The relative risk provides a measure of the magnitude of the disease-exposure association for an individual

Each provides a different piece of information about the “story”

AZT example—in this study 22% of the untreated mothers gave birth to children with HIV Relative risk: .32 Risk difference: -15% The Risk Difference vs. Relative Risk

Suppose that only 2% of the children born to untreated HIV positive women became HIV positive

Suppose the percentage in AZT treated women is .6% Relative risk: .32 Risk difference: -1.4 % The Risk Difference vs. Relative Risk

Suppose that 90% of the children born to untreated HIV positive women became HIV positive

Suppose the percentage in AZT treated women is 75% Relative risk: .83 Risk difference: -15 % What Is an Odds?

Like the relative risk, the odds ratio provides a measure of association in a ratio (as opposed to difference)

The odds ratio is a function of risk (prevalence)

Odds is the ratio of the risk of having an outcome to the risk of not having an outcome If p represents the risk of an outcome, then the odds are given by:

p odds = 1− p Example

In the AZT example, the estimated risk of giving birth to an HIV infected child among mothers treated with AZT was 0.07

The corresponding odds estimate is

∧ pˆ 0.07 odds ===1 0.08 10.93− pˆ1 Example

In the AZT example, the estimated risk of giving birth to an HIV infected child among mothers not treated was 0.22

The corresponding odds estimate is

∧ pˆ 0.22 odds ===2 0.28 10.78− pˆ2 AZT/Mother-Infant Transmission Example

The estimated odds ratio of an HIV birth with AZT relative to placebo

pˆ1 ∧ 1− pˆ 0.08 OR ===1 0.28 pˆ2 0.28

1− pˆ2

The odds of HIV transmission with AZT is .28 (about 1/3) the odds of transmission with placebo Odds Ratio

Interpretation AZT is associated with an estimated 72% (estimated OR = .28) reduction in odds of giving birth to an HIV infected child among HIV infected pregnant women Study results suggest that this reduction in odds could be as small as 46% and as large as 86% (95% CI on odds ratio, .14 – .54) Odds Ratio

What about a p-value?

What value of odds ratio indicates no difference in risk?

If p1 = p2 then

p1 1− p OR = 1 = 1 p2

1− p2

Thus, we need to test

HO: OR = 1 versus HA: OR ≠ 1 We can use the same test! (Chi-square) Hypothesis of Equal Proportions

Equivalent hypotheses:

p1 1− p1 HpOO::0:112=−== p Hpp 12 H O p2 1− p2

Always will estimate the same direction of association

∧∧ OR< 11⇔< RR ∧∧ OR>⇔11 RR > ∧∧ OR= 11⇔= RR How does OR compare to RR?

If CI for OR does not include 1, CI for RR will not include 1

If CI for OR includes 1, CI for RR will include 1

The lower the risk in both groups being compared, the more similar the two measures will be in magnitude The Odds Ratio versus Relative Risk

AZT example--in this study 7% of AZT treated mothers and 22% of the untreated mothers gave birth to children with HIV RR = 0.32 OR = 0.28

Suppose only 2% of the children born to untreated HIV positive women became HIV positive and that the percentage in AZT treated women is 0.6% RR = 0.32 OR = 0.30

Suppose 90% of the children born to untreated women became HIV positive and that the percentage in AZT treated women is 0.6% RR = 0.83 OR = 0.33 Why Bother with OR?

It is less intuitively interpretable than relative risk

However, with certain types of non-randomized study designs we can not get a valid estimate of RR but can still get a valid estimate of OR Survival Analysis Survival Analysis

Survival Analysis is the class of statistical methods for studying the occurrence and timing of events

The event (to name a few) could be tumor-free time development of a disease response to treatment relapse local control death

Survival analysis methods are most often applied to the study of deaths Basic Concepts

Survival Time: the time from a well-defined point in time (time origin) to the occurrence of a given event

Survival Data: survival time response to a given treatment patient characteristics related to response, survival, and the development of a disease Basic Concepts – Basic Survival Data Structure Basic Concepts

Survival data has two features that are difficult to handle with “conventional” statistical methods: censoring time-dependent covariates

Many consider survival data analysis as the application of conventional statistical methods when the data is censored

Censoring comes in many forms and occurs for many different reasons

The most common and basic types of censored data are Types I-III right-censored data Basic Concepts

Type I Censoring (also called singly-censored): Subjects are observed for a fixed period of time (under control of the investigator); at the end of the fixed period all remaining subjects are sacrificed/censored [e.g., at the end of the 3-year experiment, all subjects who have not died are censored at time 3 years] If there are no accidental losses, all censored observations equal the length of the study period

A x B x C x x D

0 1 2 3 Basic Concepts

Type II Censoring (also called singly-censored): Subjects are observed until a fixed pre-specified number of events have occurred; after that the remaining subjects are sacrificed/censored [e.g., experiment with 100 lab rats will stop when 50 of them have died] If there are no accidental losses, the censored observations equal the largest uncensored observation Basic Concepts

Type III Censoring occurs when: subjects are terminated for reasons that are not under the control of the investigator (e.g., withdrawals, loss to follow-up) there is a single termination time, but entry times vary randomly across subjects (e.g., patients receive heart surgery at various time points, typically uncontrolled by the investigator, but the study has to be terminated on a single date; all patients still alive on that date will be censored, but their survival times from surgery will vary)

A x o B x C o D 0 5 10 Basic Concepts

In most clinical studies the length of study period is fixed and the patients enter the study at different times “Lost-to-follow-up” patients’ survival times are measured from the study entry (well-defined point in time) until last contact [these will be censored observations] Patients still alive at the termination date will have survival times equal to the time from the study entry until study termination [these will be censored observations] When there are no censored survival times, the set is said to be complete Functions of Survival Time

Let T = survival time

The distribution of T can be described by the following functions: Survival Function: the probability that an individual survives longer than t: S(t) = P(an individual survives longer than t) = P(T > t)

S(t) is a non-increasing (and usually decreasing) function of time t with the following properties: S(t) = 1 for t = 0 and S(t) = 0 for t = ∞

S(t) is also known as the cumulative survival rate Functions of Survival Time

If there are no censored observations, the survival function is estimated as the proportion of patients surviving longer than t:

# of patients surviving longer than t Stˆ()= total # of patients Functions of Survival Time

Probability Density Function: The survival time T has a probability density function defined as the limit of the probability that an individual fails in the short interval (t, t + Δt) per unit width Δt:

P (an individual dying in the interval (tt ,+ Δ t )) ft( )= lim Δ→t 0 Δt

The graph of f(t) is called the density curve Functions of Survival Time

f(t) has the following two properties:

f(t) is a non-negative function: f(t) ≥ 0 for all t ≥ 0 = 0 for all t < 0

The area under the curve equals one Functions of Survival Time

If there are no censored observations, the probability density function (p.d.f.), f(t), is estimated as the proportion of patients dying in an interval per unit width:

# of patients dying in the interval beginning at time t ftˆ()= ()total # of patients() interval width

The density function is also known as the unconditional failure rate Functions of Survival Time

Hazard Function: The hazard function h(t) of survival time T gives the conditional failure rate

It is defined as the probability of failure during a very small time interval, assuming the individual has survived to the beginning of the interval:

P {an individual of age tt fails in the time interval ( ,t+Δt )} ht( )= lim Δ→t 0 Δt Functions of Survival Time

The hazard function can also be expressed in terms of the cumulative distribution function F(t) and the p.d.f. f(t):

ft() ht()= 1()− Ft This is also known as the instantaneous failure rate, force of mortality, conditional mortality rate, or age-specific failure rate

The hazard at any t corresponds to the risk of event occurrence at time t [e.g., a patient’s hazard for contracting influenza is 0.015 [1.2] with time measured in months, i.e., this patient would expect to contract influenza 0.015 [1.2] times over the course of a month assuming the hazard stays constant] Functions of Survival Time

If there are no censored observations, the hazard function is estimated as the proportion of patients dying in an interval per unit time, given that they have survived to the beginning of the interval:

# of patients dying in the interval beginning at time t htˆ()= ()# of patients surviving at t ( interval width) # of patients dying per unit time in the interval = # of patients surviving at t Functions of Survival Time

Actuaries usually use the average hazard rate of the interval in which the number of patients dying per unit time in the interval is divided by the average number of survivors at the midpoint of the interval:

# of patients dying per unit time in the interval htˆ()= 1 ()# of patients surviving at t − 2 () # deaths in the interval Functions of Survival Time

The cumulative hazard function is defined as

t Ht()==− hxdx ( ) log St () ∫0 e Thus, at t = 0 S(t) = 1, H(t) = 0 at t = ∞ S(t) = 0, H(t) = ∞ Relationships among the Survival Functions

ft() ht()= St() d ft()=−=−⎡⎤ 1 St () S '() t dt ⎣⎦ St'( ) d ht()=− =− logSt () St() dt e t −=hxdx() log() St ∫0 e

Ht()=− loge St () t St()=− exp⎡⎤ Ht () =− exp⎡ hxdx ( ) ⎤ ⎣⎦ ⎣⎢ ∫0 ⎦⎥ ft()=− ht ()exp⎣⎦⎡⎤ Ht () Nonparametric Methods for Estimation of the Survival Functions Product-Limit Estimates (Kaplan-Meier) [most widely used in biological and medical applications]

Life Table Analysis (actuarial method) [for large number of observations or if there are many unique event times]

Relative, Five-Year, and Corrected Survival Rates Nonparametric Methods for Comparing Survival Distributions Did the treatment make the difference in the survival experience between “treatment A” and “treatment B” patients?

Gehan’s Generalized Wilcoxon Test [better for detecting short- term differences in survival] Cox-Mantel Test Log-rank Test [better for detecting long-term differences in survival] Peto and Peto’s Generalized Wilcoxon Test Cox’s F-Test Mantel and Haenszel Test [allows stratification based on prognostic factors] Cox’s Proportional Hazard Regression [adjustment for multiple prognostic factors] Survival Examples

Example 1: Pilot study was conducted at the VA hospital to compare Accelerated Fractionation Radiation Therapy versus Standard Fractionation Radiation Therapy for patients with advanced unresectable squamous cell carcinoma of the head and neck.

Is there any difference in survival between the patients treated with Accelerated FRT and the patients treated with Standard FRT? Example 1: Summary of Patient Characteristics

AFRT SFRT Gender Male 28 (97%) 16 (100%) Female 1 (3%) 0 Age Median 61 65 Range 30-71 43-78 Primary Site Larynx 3 (10%) 4 (25%) Oral Cavity 6 (21%) 1 (6%) Pharynx 20 (69%) 10 (63%) Salivary Glands 0 1 (6%) Stage III 4 (14%) 8 (50%) IV 25 (86%) 8 (50%) Tumor Stage T2 3 (10%) 2 (12%) T3 8 (28%) 7 (44%) T4 18 (62%) 7 (44%)

Example 1: KM Curves by Treatment

Overall Survival by Treatment

1.00

AFRT SFRT

0.75

censored 0.50 observation Survival Probability 0.25

0.00 0 1224364860728496108120 Survival Time (months) Median Survival Time: AFRT: 18.38 months (2 censored observations) SFRT: 13.19 months (5 censored observations) Example 1: KM Curves by Treatment

Overall Survival by Treatment

1.00

AFRT SFRT

0.75

0.50 Survival Probability 0.25

0.00 0 1224364860728496108120 Survival Time (months)

Log-Rank test p-value= 0.5421 Example 1: Further Analysis

Let us consider the analysis of the survival of these patients by Stage as well as by Treatment Example 1: KM Curves by Stage and Treatment

Overall Survival by Treatment and Stage

1.00

AFRT/Stage 3 AFRT/Stage 4 SFRT/Stage 3 0.75 SFRT/Stage 4

0.50 Survival Probability 0.25

0.00 0 1224364860728496108120 Median Survival Time: Survival Time (Months) AFRT Stage 3: 77.98 mo. AFRT Stage 4: 16.21 mo. SFRT Stage 3: 19.34 mo. SFRT Stage 4: 8.82 mo.

Log-Rank test p-value = 0.0792 Example 2: Survival of Leukemia Patients Example 2: SAS Output Study Design Clofibrate Trial

A study was performed to look at the effect of a drug, Clofibrate, on mortality rates for individuals with heart disease Individuals were followed for five years after administration of the drug Results: Clofibrate Group

Coronary heart disease Results from group randomized to take Clofibrate Results: Clofibrate Group

Coronary heart disease Results from group randomized to take Clofibrate

Question: Do these results strongly demonstrate the efficacy of Clofibrate in reducing the mortality in subjects with CHD? Results: Placebo Group

Coronary heart disease Results from group randomized to take placebo The Dangers of Self-Selection

Overall mortality in each of the groups in this randomized trial The Dangers of Self-Selection

Randomized trial No significant difference (p > 0.2) between the treatment and placebo groups

No difference between TX groups The compliers and non-compliers were similar with respect to other variables (age, etc.) There were no apparent harsh side effects of Clofibrate relative to placebo that may have resulted in differential compliance between the Clofibrate and placebo groups

Source: Coronary drug research group (1980). NEJM, 1038-1041 A Randomized Control Group

Important for accounting for many kinds of biases

Randomization, done correctly on a large number of subjects nearly ensures that the only systematic difference in the groups being compared is the exposure(s) of interest Note

A very famous randomized trial (once again) 200,745 vaccinated for polio ~ 400,000 school children randomized 201,229 given a placebo 1954 Salk Polio Vaccine Trial

At the end of the follow-up period there were 82 cases in the vaccine group and 162 in the placebo group

Subsequently analyses report slightly different numbers because some false positives were discovered in each of the two groups 1954 Salk Polio Vaccine Trial

Result Results in 2X2 Table Format

How to test for an association between vaccine polio?

HO: pvaccine = pplacebo

HA: pvaccine ≠ pplacebo

You can use either Fisher’s Exact test or a Chi-square test

With Fisher’s Exact, p < 0.001 Randomized Study Design

Prospective cohort studies Choose a fixed number with and without exposure Follow subjects for a set period of time and determine who has disease/outcome of interest

Measures of association Difference in proportions Relative risk (ratio of proportions) Odds ratio Benefits of Randomization

Randomization helps protect against self-selection biases Examples: Males are more likely to volunteer for placebo than females Smokers are less likely to be in the exposed group Healthier persons sign up for the intervention

The goal of randomization is to eliminate any systematic differences in characteristics of subjects in each of the exposure groups under study, save for the exposure itself Randomized Study Design

Many epidemiological studies are concerned with estimating an association between two dichotomous (binary) variables Example: exposure-disease association

Randomized study design with control group is a type of prospective cohort study Choose a fixed number with and without exposure Follow subjects for a set time period and determine who has the disease/outcome of interest

Measures of association Difference in proportions Relative risk (ratio of proportions) Odds ratio Methods of Randomization Patient Registration

For each patient considered suitable for inclusion in a clinical trial, the following formal sequence of events should take place. Patient Registration

Patient requires treatment Patient eligible for inclusion Clinician willing to accept randomization Patient consent is obtained Patient formally entered on trial Treatment assignment obtained from randomization list On-study forms are completed Treatment commences Patient Recruitment

The identification of appropriate patients is important to ensure that the sample is representative of the disease under investigation. Trial’s conclusions can be readily applied All relevant patients should be considered Clinician should avoid being unduly selective in choice of patients Checking Eligibility

Eligibility should be checked right away by going through the list of eligibility criteria in the protocol. If any ineligible patients accidentally make it through the screening process, they may be excluded from the analysis. Agreement to Randomize

Anything other than randomization is unacceptable if randomization was agreed upon. If a clinician is unwilling to randomize, he or she should not participate in the trial at all. Patient Consent

Written informed consent is an ethical and legal requirement that must be obtained before randomization. Formal Entry on Trial

Registering a patient for entry onto a trial before randomization ensures that all randomized patients are followed for details of treatment and evaluation. Helps to guard against investigators not giving the randomized treatment Value in keeping a separate log of eligible patients not entered into the trial, which provides insight into the representativeness of trial patients Random Treatment Assignment

A randomization list of consecutive random treatment assignment should have been prepared in advance. The clinician should not know the order of this list, nor should they be able to predict the next treatment. Prepared by an independent person, statistician, etc. Random Treatment Assignment

Methods for Revealing Treatment: Sealed envelopes opened after formal entry onto trial. List given directly to pharmacist for double-blind evaluations. For a multi-center trial with a central registration office, the assignment can be read off the randomization list to the investigator over the phone. For a trial in a single institution, it is best to have an independent person responsible for patient registration and randomization who is not concerned with treating any of the patients. Documentation

Confirmation of registration Name, date of birth, trial number, assigned treatment

On-study Form Previous therapy, personal details, clinical condition, tests Efficiency and Reliability

(1) through (7) should be be completed before treatment commences. The entire process of registration and randomization should be prompt and efficient so there is no delay in treatment. Avoid unnecessary drop-outs Preparing the Randomization List

Simple randomization Replacement randomization Random permuted blocks Biased coin method Simple Randomization

Two treatments, A and B Table of random numbers Start with top row and work down Assign A for digits 0−4 Assign B for digits 5−9 AB AB BA AB AA BB AB BA BB BB BA BA AB BB BA Simple Randomization

Three treatments, A B C Assign A for 1−3 Assign B for 4−6 Assign C for 7−9 Ignore 0 - B AC CB AC BA BC AC BA BB CB CA C- BC CC CB Simple Randomization

Four treatments, A B C D Assign A for 1,2 Assign B for 3,4 Assign C for 5,6 Assign D for 7,8 Ignore 0,9 - C AD DB BD BA CD BD CA CC - C DA D- BD DD DB Simple Randomization

Each treatment is completely unpredictable. Probability theory guarantees that in the long run the numbers of patients on each treatment will not be radically different. It is often desirable to restrict randomization to ensure similar treatment numbers throughout a trial. Random Permuted Blocks

To ensure exactly equal treatment numbers at equally spaced points in the sequence of patient assignments. For T treatments, block into k patients and produce a different random ordering of k assignments to each treatment. Random Permuted Blocks

For two treatments A and B Blocks of two patients Assign AB for 0−4 Assign BA for 5−9 ABBA ABBA BAAB ABBA ABAB BABA ABBA BAAB …. Random Permuted Blocks

Three treatments, A B and C Blocks of three patients ABC for 1 ACB for 2 BAC for 3 BCA for 4 CAB for 5 CBA for 6 Ignore 7, 8, 9, and 0 - CAB ACB - - BCA BAC ABC CBA - BAC - CAB ABC CAB CBA . . . Random Permuted Blocks

Two treatments A and B Blocks of four patients AABB for 1 ABAB for 2 ABBA for 3 BBAA for 4 BABA for 5 BAAB for 6 Ignore 0 and 7−9 - BABA ABAB - - BBAA ABBA - BBAA AABB BAAB - ABBA - BABA …. Random Permuted Blocks

In general, a trial without stratification should have a reasonably large block size so as to reduce predictability, but if interim analysis is intended, not so large that serious mid-block inequality might occur. The smaller choice of block size the greater the risk of predictable randomization. For stratified assignment, small block sizes are recommended.

Random Permuted Blocks

A trial for 100 patients could have a block size of 20: Use 0−9 for A and 10−19 for B Block 1: BBBAAAABAABABBABBAAB Block 2: BBAABAABBBABAAABBABA For three treatments, blocks of size 15: Use 1−5 for A, 6−10 for B, 11−15 for C Block 1: CCABBCBAACABBAC Block 2: CCABBCACABABBCA Biased Coin Method

At each point in the trial one observes which treatment has the least patients so far: that treatment is then assigned with probability p > ½ to the next patient. If the two treatments have equal numbers then simple randomization is used for the next patient. Biased Coin Method

One has to choose an appropriate value for p using probability theory. It turns out that for p = 3/4, 2/3, 3/5, or 5/9, a difference in treatment numbers of 4, 6, 10, 16, or more has less than a 1 in 20 chance of occurring at any particular point in the trial. Biased Coin Method

p = 3/4 maintains very strict control on treatment numbers but is somewhat predictable once any inequality exists. p = 2/3 may be an appropriate choice for a relatively small trial. p = 3/5 is good for larger trials (100 patients) Biased Coin Method

Using p = 3/5 Assign the treatment with least patients for 0−5 Assign the treatment with most patients for 6−9 When treatment numbers are equal, revert to 0−4 for A and 5− 9 for B (equal randomization)

05278437416838515696 A B A AABBABBB BABAAB BBB Stratified Randomization

Stratified randomization assists us in ensuring that treatment groups are similar as regards to certain relevant patient characteristics. Example: breast cancer, pre- and post-menopausal The larger the trial, the less likely it is that there will be an imbalance. Stratified randomization is an ‘insurance policy.’ Reasons for Not Using Stratification

1. If the trial is very large (200+) and interim analysis of early results is either not feasible or of minor interest, stratification has little point. 2. If the organizational resources for supervising randomization are limited then the increased complexity of stratification may carry a certain risk of errors. Reasons for Not Using Stratification

3. If there is uncertainty about which patient characteristics might influence response, then one has inadequate knowledge on which to carry out stratification. Stratification

Decide which patient factors by which to stratify. Outcomes of previous patients in earlier trials Should be convinced of a factor’s potential for making an impact on the outcome before it is included in stratification If only vague knowledge exists, do not stratify Random Permuted Blocks within Strata

Example: breast cancer nodal status (number of positive axillary nodes, < 4, ≥ 4) age (< 50, ≥ 50) Strata

< 50 < 4 < 50 ≥ 4 ≥ 50 < 4 ≥ 50 ≥ 4 Random Permuted Blocks within Strata

A separate restricted randomization list is prepared for each of the strata using the previous methods. Generally, a small block size is used when stratifying by several factors. Randomization needs to be more tightly restricted to be effective Chances of any investigator predicting the last assignment in any block is considerably reduced Age: < 50 ≥ 50 < 50 ≥ 50 Nodes: < 4 < 4 ≥ 4 ≥ 4 B B A B A B A A B A B A A A B B

A A B A B A A B A B B B B B A A

A B B B B A A B B A B A A B A A

B A B A B B A B A B A A A A B B ...... Random Permuted Blocks within Strata

The more factors included, the greater the pre-trial documentation and amount of information required each time a patient is registered. Consider whether the extra burden is justified. Issues

For so many strata, will you achieve comparable treatment groups? Over-stratifying can be self-defeating since a large number of strata with incomplete randomized blocks can lead to substantial imbalance between treatment groups. The Minimization Method

Over-stratification: no one is especially interested in ambulatory patients aged less than 50 years with visceral metastases and disease-free interval for over two years since this is a very small subgroup of patients with advanced breast cancer. The Minimization Method

One is more interested in ensuring that the different treatment groups are similar as regards the percentage ambulatory, percentage under 50, etc. This method balances the marginal treatment totals for each level of each patient factor. The Minimization Method

Example: Advanced breast cancer trial, 80 patients already enrolled

Treatment Factor Level ABNext Patient PS Ambulatory 30 31 ← Nonambulatory 10 9 Age < 50 y 18 17 ← ≥ 50 y 22 23 DF Int < 2 y 31 32 ≥ 2 y 9 8 ← Dominant Visceral 19 21 ← metastatic Osseous 8 7 lesion Soft tissue 13 12 The Minimization Method

For each treatment, add up the number of patients in the corresponding rows of the table: A = 30 + 18 + 9 + 19 = 76 B = 31 + 17 + 8 + 21 = 77 Give the treatment with the smallest sum of marginal totals to the next patient (A) If equal, use simple randomization Keep a continual up-to-date record of patients Balancing for Institution

In multi-center trials, we must consider whether the institution entering a patient should also be a factor worth including in stratification. Different institutions can show very different response rates. Patient selection Experimental environment Balancing for Institution

If no other factors are considered, simply provide a separate randomization list for each institution. If the minimization method is used, simply include ‘institution’ as another patient factor. Stratified Randomization

Although stratification is theoretically efficient, the practical circumstances must dictate whether its use is desirable. Unequal Randomization

Equal-sized treatment groups provide the most efficient means of treatment comparison. However, interest is usually in gaining greater experience into a new treatment’s profile, or there exists a large amount of enthusiasm for the new therapy. Consider randomizing more often to the new therapy. Unequal Randomization

Consider: 5% level of significance (type I error) is used to determine required sample size for 95% power What happens to the power if we decide to put a greater proportion on the new treatment? Unequal Randomization Unequal Randomization

If the overall size of the trial is kept constant, the power decreases relatively slowly as one begins to move away from equal sized groups. The power decreases from 95% to 92.5% if we place 2/3 patients on new treatment. Power is down to 82% for 4/5 patients on new treatment. Unequal Randomization

Randomization in a 2:1 or 3:2 ratio is a realistic proposition but more extreme imbalance may be undesirable. Considerable loss of statistical power Unequal randomization is becoming more common. Response-adaptive clinical trials Adaptive Trial Designs

Unlike a fixed design, adaptive designs allow observations made during a trial to update certain aspects of the design, such as randomization scheme sample size stopping early dropping/adding arms These updates are not “ad hoc,” but are set in advance of the trial and are based on clinical, statistical, logistical, and ethical considerations. Adaptive Trial Designs

Adaptive clinical trials provide an ethical and efficient mechanism for testing multiple treatments that provide statistically sound results for future treatment. lessen exposure of current study participants to ineffective/unsafe treatments. The Bayesian approach provides a natural framework for the continual updating required by adaptive designs. Introduction to Confounding Confounding

Consider the results from the following (fictitious) study: Investigating the association between smoking and a certain disease in male and female adults 210 smokers and 240 non-smokers were recruited Confounding

Smoking is protecting against disease?

Most of the smokers are male and non-smokers are female: Confounding

Smoking is protecting against disease?

Most of the persons with disease are female: Confounding Confounding

The original outcome is disease

The original exposure of interest is smoking

In this example, gender is related to both the outcome and exposure This relationship is possibly impacting the overall relationship between disease and smoking

How can we look at the relationship between disease and smoking removing any possible ‘interference’ from gender? One approach—look at disease and smoking relationship separately within males and females Example

Is smoking related to disease in males? Example

Is smoking related to disease in females? Smoking, Disease, and Gender

Recap: The overall (crude/unadjusted) relationship (RR) between smoking and disease was nearly one (risk difference nearly 0)

The sex specific results showed similar positive associations between smoking and disease

Note: for the moment we aren’t considering statistical significance, just using estimates to illustrate the point Simpson’s Paradox

The nature of an association can change (and even reverse direction) or disappear when data from several groups are combined to form a single group

As association between an exposure X and a disease Y can be confounded by another lurking (hidden) variable Z Simpson’s Paradox

A confounder Z distorts the true relationship between X and Y

This can happen if Z is related to both X and Y Confounding What is the solution?

If you don’t know what the potential confounders are, there’s not much you can do after the study is over Randomization is your best protection—it eliminates the potential links between the exposure of interest and potential confounders

If you can’t randomize but know what the potential confounders are, there are statistical methods to help control or adjust for confounders—make sure you’re measuring the potential confounders as part of your study How to adjust for confounding?

Stratify Look at the tables separately Male/female, by clinic, etc. Take the weighted average of stratum specific estimates

For example, in the disease/smoking situation: To get a sex adjusted relative risk for the smoking disease relationship we could weight the sex-specific relative risks by numbers of males and females: How to adjust for confounding?

There are better ways than this to take such a weighted average (weighting by standard error, for instance) but this illustrates the concept

Confidence intervals can be computed for these adjusted measures of association

One way to assess whether sex is a confounder: compare crude RR to sex-adjusted RR—if it’s ‘different’ then sex is a confounder How to adjust for confounding?

Regression methods can be used More generalizable than weighted average approach, but the idea is similar Example: Arm Circumference and Height

An observational study to estimate the association between arm circumference and height in Nepali children 94 randomly selected subjects (ages 3 months to 6.5 years) had arm circumference, weight, and height measured This study is observational—it is not possible to randomize subjects to height groups Example: Arm Circumference and Height

The data: Arm circumference range: 11.6 – 16.5 cm Height range: 57 – 109 cm Weight range: 5 – 18 kg

To perform the analysis: Dichotomize height at median (87 cm) Dichotomize weight at median (11.4 kg) Example: Arm Circumference and Height Example: Arm Circumference and Height

Mean arm circumference by height group

Shorter subjects have arm circumference on average 0.7 cm lower than taller subjects (mean difference = -0.7 with 95% CI -1.1 to - 0.3 cm) Example: Arm Circumference and Height

However, it is very likely that arm circumference and height are both related to a child’s weight

Some of the relationships between arm circumference and height could be because of, or masked by, these ‘behind the scenes’ relationships to weight Example: Arm Circumference and Height

What about weight? Example: Arm Circumference and Height

Possible diagram of this scenario Example: Arm Circumference and Height

Recall the original finding: children below the median height had arm circumferences of 0.7 cm lower on average than children equal to or above the median height

To investigate whether this estimate is being fueled or lessened in part by weight differences in the height groups, stratify by weight group and estimate the arm circumference/height association in each weight group Example: Arm Circumference and Height

Mean arm circumference by height group Children below median weight

Shorter subjects below the median weight have average arm circumferences on average 0.02 cm larger than taller subjects below the median weight (95% CI: -0.64 [lower] to 0.68 [higher] cm) Example: Arm Circumference and Height

Mean arm circumference by height group Children equal to or above median weight

Shorter subjects at or above the median weight have average arm circumferences 0.06 cm larger than taller subjects at or above the median weight (95% CI: -0.9 [lower] to 1.0 [higher] cm) Example: Arm Circumference and Height

Recap: Ignoring weight, children below the median height had arm circumferences of 0.69 less (on average) than children at or above the median height and this difference was statistically significant When stratified by weight children below the median height had arm circumferences marginally larger (on average) than children with or above the median height in both weight groups, but these estimates were very close to 0 and not statistically significant Example: Arm Circumference and Height

It appears that the association between arm circumference and height ‘disappears’ after accounting for weight

Associations (shorter versus taller): Crude/unadjusted -0.7 cm (95% CI -1.1 to -0.3) Adjusted? One possibility: take the weighted average of weight specific AC/height associations, weighted by the inverse of SEs of weight specific associations: Example: Arm Circumference and Height

We can get 95% CIs for this adjusted Estimate: -0.3 to 0.38 cm Example: Arm Circumference and Height

One approach—take a weighted average of the average arm circumference differences between subjects below and above the median weight within weight groups (weighted by the size of each group)

However, this is a pain and if there are more potential confounders we could spend our life stratifying and computing such estimates

Better approach—multiple regression methods Example: Arm Circumference and Height

FYI A weighted overall average height adjusted difference in arm circumference between two weight groups is 0.98 cm with 95% CI 0.4 cm to 1.55 cm

When adjusted for weight, the arm circumference/height association disappears

When adjusted for height, the arm circumference/weight association is almost the same as the unadjusted arm circumference/weight association Example: Arm Circumference and Height

This is an interesting case, perhaps better illustrated by this picture: Example: Arm Circumference and Height

This is not always the case—many times when there is confounding between and outcome and two (or more) grouping variables, all of the adjusted outcome/group relationships will differ from the unadjusted associations Example: South African Study

A longitudinal study from South Africa: birth cohort, followed up five years after birth

Participation by medical aid status at birth, all baseline participants

95% CI: 0.53 to 0.92 Example: South African Study

A longitudinal study from South Africa: birth cohort, followed up five years after birth

Participation by medical aid status at birth, black participants

95% CI: 0.76 to 1.36 Example: South African Study

A longitudinal study from South Africa: birth cohort, followed up five years after birth

Participation by medical aid status at birth, white participants

95% CI: 0.25 to 4.5 Example: South African Study

Race: majority of sample is black (91%)

Race and follow-up participation: 26% of black subjects completed follow-up as compared to 9% of white subjects

Race and medical aid: 9% of black subjects had medical aid compared to 83% of white subjects Statistical Interaction/Effect Modification Effect Modification/Interaction

Let’s look at the results from a fictitious data set comparing two treatments for a fatal disease as to the impact of each on reducing deaths: there are 600 ‘younger’ patients and 600 ‘older’ patients in this random sample 1,200

Here is the data on all patients

Surgery and drug groups have identical proportions dying (50% in each group)! Effect Modification/Interaction

Here is the data on only the 600 younger patients Effect Modification/Interaction

Here is the data on only the 600 older patients Effect Modification/Interaction

A recap of the overall and age specific results Effect Modification/Interaction

Age is an effect modifier: Age modifies the association between death and treatment (statistical interaction between age and treatment)!

The association between death and treatment depends on age Surgery is better for younger patients, drug therapy is better for the older patients Effect Modification/Interaction

The association between death and one variable (treatment) depends on the level of another variable (age)

Here, it would not make sense to estimate one composite, overall measure of the association between death and treatment

Best way to look at this data is to just look at the two tables (young and old) separately and estimate two separate death/treatment associations Example: Tree Damage and Elevation

Data on elevation and percentage of dead or badly damaged trees, from 64 Appalachian sites (reported by Committee on Monitoring and Assessment of Trends in Acid Deposition, 1986)

Eight of the 64 sites are in Southern states

Elevation information--whether the site was above or below 1,100 meters

This is an observational study (why?) Example: Tree Damage and Elevation

Data for the first ten sites Example: Tree Damage and Elevation

Let’s try to assess the relationship between the percentage of damaged trees and elevation--here is a boxplot of the percentage of damaged trees by elevation Example: Tree Damage and Elevation

Mean percentage of damaged trees by elevation group Example: Tree Damage and Elevation

What about region though? Boxplot percentage of damaged trees by region Example: Tree Damage and Elevation

So sites in the South have less damage on average (i.e., not only is the percentage of damaged trees related to elevation, but it is also related to region)

If region is related to elevation, then it’s possible that part of the relationship we saw (or didn’t see) between damage and elevation is because of the region-damage-elevation relationship Example: Tree Damage and Elevation

Possible diagram of this scenario Example: Tree Damage and Elevation

Recall the original finding--sites with lower elevation had a marginally lower percentage of damaged trees on average: 0.2% less damaged than sites at higher elevation

To adjust for regional differences in the elevation groups, and the damage/region relationship, let’s stratify by region, and estimate the damage/elevation association in each region Example: Tree Damage and Elevation

Mean percentage of damage tree by elevation in southern sites

Southern sites at higher elevation have 14.5% less damaged trees on average than southern sites at lower elevation (95% CI: 48.0% lower - 19.0% higher) Example: Tree Damage and Elevation

Mean percentage of damage tree by elevation in northern sites

Northern sites at higher elevation have 23% less damaged trees on average than northern sites at lower elevation (95% CI: 2% lower - 44% higher) Example: Tree Damage and Elevation

A recap Ignoring region, sites with lower PCV and sites with lower elevation had marginally lower percentage of damaged trees on average--0.2% less damaged than sites at higher elevation

When stratified by region . . . Northern sites showed positive, statistically-significant association between damage and elevation Southern sites showed negative, non-statistically-significant association between damage and elevation Example: Tree Damage and Elevation

So, it appears as though the association between tree damage and elevation is different, both in magnitude and direction depending on region

We have a small dataset, so lack of statistical significance of negative damage/elevation associations in the south may be because of lower power Example: Tree Damage and Elevation

One approach--take a weighted average of the average damage differences between sites at low and high elevations within each region, weighted by number of observations in each region

However, does not necessarily make sense here--why combine estimates that differ in direction into one overall estimate?

Better approach--to report two mean differences in damage between low and high elevation sites (one estimate for northern sites, one estimate for southern sites)

Better approach--multiple regression methods References and Citations

Lectures modified from notes provided by John McGready and Johns Hopkins Bloomberg School of Public Health accessible from the World Wide Web: http://ocw.jhsph.edu/courses/introbiostats/schedule.cfm