7/25/2015
49th Annual Meeting Disclosure
I do not have a vested interest in or affiliation with any corporate organization offering financial Applications of Biostatistics support or grant monies for this continuing education activity, or any affiliation with an organization whose philosophy could potentially bias my Randy C. Hatton, BPharm, PharmD, FCCP, BCPS presentation Clinical Professor University of Florida, College of Pharmacy
OWNING CHANGE: Taking Charge of Your Profession
Objectives Internal Validity
Apply commonly used biostatistical concepts to Does the study measure what is supposed to? review literature Summarize data using appropriate forms of Is the outcome due to a random error biostatistical review [chance]?
Provide practical approaches for simplifying Is the outcome due to a systematic error biostatistics in daily practice [other cause or bias]?
Issues with Internal Validity Random Error
Random error All 100 pneumonia patients who received penicillin
Systematic error recovered, while 10 of the 100 pneumonia patients Patient who received placebo died Selection bias What is the chance that the difference in mortality did Attrition bias not occur by chance but rather due to the penicillin Confounding exposure? Measurement Choice and timing of measures Measurement bias Reliability and validity of measurement
1 7/25/2015
Theory of Science Control for Random Error
Study the universe of patients Sample Population Rarely done… US Census
Study a sample of the population and estimate the Sample chance that the conclusions are “wrong”
Statistical Inference Random Error & Sample Size
The outcome happened by chance because the inference was made based on a sample If the universe of patients were tested no random error could occur Penicillin No Penicillin True mortality 0.1% 10% N=10 0/10 1/10 N=100 0/100 10/100 N=1000 1/1000 100/1000
Types of Variables Types of Variables
Independent Variable (Exposure or Treatment) Dependent Variable (Measurement or Outcome) Control variable Response variable Predictor variable Experimental variable Grouping variable Primary Outcome [Dependent] Variable Manipulated variable Only one Treatments Secondary Outcomes Conditions Planned Factors Subset Analyses Planned or unplanned Post-hoc analyses Unplanned
2 7/25/2015
Levels of Measurement Levels of Measurement
Nominal Ordinal Data that can be classified into mutually exclusive Data that can be ranked without a consistent level of groups without any order or ranking of severity magnitude between ranks Binary means two nominal outcomes Number assigned to ranking Examples Examples Demographics Gender Likert scale Race Pain scale Risk factors Quality of life scores Outcomes or endpoints Lived OR died (binary) Response OR no response
Levels of Measurement Levels of Measurement
Continuous Data that are measured on a continuum with a consistent level of magnitude between units Sometimes only measured in whole numbers Number of seizures, length-of-stay, heart rate Interval: arbitrary zero Degrees centigrade Ratio: absolute zero BP, body weight, half-life, drug concentration
Keegan S. Ann Pharmacother 2009;43:19-27.
Levels of Measurement Levels of Measurement
If you don’t understand a measurement, you can’t Nominal < Ordinal < Continuous possibly determine statistical or clinical significance You can use a statistical test designed for a lower level APACHE II Score? of measure of a dependent variable, but [in general] SOFA Score? not the other way around Somewhat controversial Child-Pugh Score? Using parametric methods to analyze pain measurements, that aPTT are generally ordinal variables [nonparametric] Continuous variables are often surrogate outcomes Thus, are lower levels of measurement clinically The mathematical and clinical hierarchy of levels of measurement do not match
3 7/25/2015
Valid Measurements Types of Variables
Valid Outcomes Extraneous Variables or “Confounders” Appropriate for the question being asked Confounders are differences between the study and Measures what it is supposed to measure control groups due to chance or bias that could affect Compared the measurement to the reference [gold] standard the dependent variables MUST be reliable Controlling confounders by design or analysis is What do values or changes in values mean? important How much change is clinically significant Randomization Stratified random allocation Cross-over design Mantel Haenszel Chi-square
Descriptive Statistics Descriptive Statistics
Measures of Central Tendencies Applicability of Measures of Central Tendency Mean Average Characteristic Mean Median Mode Other means Used for continuous data Yes Yes Yes Geometric mean Used for ordinal data No? Yes Yes Median Middle value Used for nominal data No No Yes
Mode Affected by outliers Yes No No Most common value
Gaddis & Gaddis. Ann Emerg Med 1990;19:309-15.
Descriptive Statistics Descriptive Statistics
Measures of Variability
Range “Outer” fence = 3 IQR Standard deviation Standard Error of the Mean (SEM) Measure of precision for the mean estimate Not a measure of variability } Whisker = 1.5 IQR Interquartile range (IQR) IQR 25th-75th percentile 25-75th Box Plots percentile {
4 7/25/2015
Box Plots vs Frequency Histogram Descriptive Statistics
Applicability of Measures of Dispersion Characteristic Range IQR SD SEM ** * * * Useful for continuous data Yes Yes Yes Yes Useful for ordinal data Yes Yes No? No Describes sample variability Yes Yes Yes No Assists in statistical inferences No No Yes Yes Used to calculate CIs No No No Yes
Gaddis & Gaddis. Ann Emerg Med 1990;19:309-15.
How to Interpret CIs How to Interpret CIs
Confidence Intervals
Used to describe a variable
May be used instead of [or with] hypothesis testing Differences: Zero in a CI Ratios: One in a CI
Range of “reasonable values” for the parameter of interest 90%, 95%, and 99% CIs
Estimates the magnitude, direction, and certainty of a measurement Weaver SJ. J Am Pharm Assoc 2004;44:694-9.
Hypothesis Testing The Null Hypothesis (H0)
Null Hypothesis Decision The Truth A = B H True H False Alternate Hypothesis O O A B Reject One-tailed vs two-tailed Type I error Correct Difference in either direction (HO false) Easier to show superiority with a one-tailed test Retain Correct Type II error (HO true)
5 7/25/2015
The Null Hypothesis (H0) Type I and II Errors
Decision The Truth There cannot be Type I and Type II errors at the same time, for the same dependent variable Equal Different Type I error is only a possible problem when there is statistical significance Groups are Correct Type II error Type II error is only a possible problem when there Equal is no statistical significance Small studies that find a difference have a sufficient Groups are sample size Different Type I error Correct These studies are NOT under-powered
Sample Size (n=) Sample Size (n=)
Alpha Alpha, by convention, is usually 0.05
Beta But can be lower or higher Power = 1- beta Beta, by convention, is usually 0.2 or 0.1
Effect size Power = 80% or 90% Clinically significant difference Clinically significant differences are set by Variability [continuous variables] convention and/or must be defensible
Additional adjustments If continuous variables are variable = larger Attrition rate samples Distribution Attrition rates are based on other studies
Why Estimate Sample Size? Why Estimate Sample Size?
A sample that is too small is under-powered A sample that is too large can result in over- A sample that is too small may suggest an important powering difference, but it would not be statistically significant Statistically significant differences that are clinically We could miss a potentially valuable intervention irrelevant A sample that is too small may show no difference, but Wastes resources [time and money] and put patients at we don’t know if there is really no difference or just did unnecessary risk not have the power to detect a difference Too small a sample wastes resources [time and money] and may put patients at unnecessary risk
6 7/25/2015
Why Estimate Sample Size? P-Values
…”Based on a literature review, the event rate for the Calculated probability of making a Type I error first primary outcome variable was predicted to be 20% ……For the experimental therapy to be Concluding there is a difference when there is not considered clinically beneficial, it was judged Alpha: the pre-set acceptable Type I error necessary to lower the event rate to 5%.... A sample < 0.05 most common size calculation indicated that 174 patients were needed to demonstrate a difference in .. [this].. Higher alpha accepted when you do not want to miss an outcome … with a power of 80% and an overall effect [exploratory research = 0.10] alpha of 0.05….” Lesser values required… Good experimental studies have an a priori sample Multiple testing size determination Interim analyses Observational studies often use the data available, and under-and over-powering may be issues Statistical differences do not guarantee clinical Trachtman et al. JAMA Vol.290 No.10, 2003 importance
Differences Between Groups Random Error?
Tx (20% mortality) Control (40% mortality) Comparison of aspirin (20%) and non-aspirin users RR = 0.5 (40%) 10 patients in each group: 2 MIs Versus 4 100 patients in each group: 20 MIs Versus 40 N (per group) 95% CI p = 1000 patients in each group: 200 MIs Versus 400 10 0.11 – 2.14 > 0.2 Random error becomes smaller … 100 0.32 –0.79 < 0.05 with a larger sample of patients 1000 0.43 – 0.58 < 0.001 with a larger baseline incidence with a larger difference between groups
Interpretation of P-Values Interpretation of P-Values
P=0.2 If the differences between the variables being Denotes the probability of a Type I error compared is not clinically significant, then the p- [conclude a difference when there is not] value is irrelevant 20% chance of making a Type I error If the p-value is above 0.05? If less than 0.05, you have statistical significance that is A statistician would say that the p-value is the clinically irrelevant calculated probability that a test statistic would be as large [or as small] as observed
7 7/25/2015
How to Interpret P-values? Statistical Significance
If the difference is clinically significant Examine the sample size and number of events % Responding to If p is small (e.g., p=0.001), it is very unlikely that the Treatment P- Value Statistical difference occurred by chance New Standard Significance If the p is large (e.g., p>0.5), it is likely that the differences occurred by chance 480/800 = 60% 416/800 = 52% 0.001 Yes
Do not obsess on a p<0.05, which is arbitrary 15/25 = 60% 13/25 = 52% 0.57 No Remember, p-values do not address systematic error 15/25 = 60% 9/25 = 36% 0.09 No Studies are usually not refuted after publication because they had a large random error 240/400 = 60% 144/400 = 36% <0.0001 Yes
Braitman LE. Ann Intern Med 1991;114:515-7.
Estimation Statistical vs Clinical Significance
% Responding to Treatment Difference in % Graph of 95% CI Responding (● = point estimate) Smallest Clinically Important New Standard Point 95% CI Difference Assumed to 15% • Estimate 480/800 = 60% 416/800 = 52% 8% 3% to 13%* •
15/25 = 60% 13/25 = 52% 8% -19% to 35% • • 15/25 = 60% 9/25 = 36% 24% -3 to 51% Zero Difference = NSD
240/400 = 60% 144/400 = 36% 24% 17% to 31%* Braitman LE. Ann Intern Med 1991;114:515-7.
Braitman LE. Ann Intern Med 1991;114:515-7. Braitman LE. Ann Intern Med 1991;114:515-7. Statistically significant = *
Relative vs Absolute Differences Relative vs Absolute Differences
Risk in exposed = Absolute Risk Risk of an adverse event
Risk in unexposed = Absolute Risk Drug No Drug RR Increased risk Absolute increase 500/1000 300/1000 1.67 67% 200 more per 1000 100/1000 60/1000 1.67 67% 40 more per 1000 Absolute Risk (exposed) = Relative Risk (RR) 5/1000 3/1000 1.67 67% 2 more per 1000 Absolute Risk (unexposed)
With RR you lose the numerator and denominator
8 7/25/2015
Exercise: GUSTO Trial GUSTO
Calculate the number-needed-to treat to save 1 life What is a unsucessful outcome? death Using alteplase (t-PA) instead of streptokinase Calculate the CI for this NNT How many got t-PA (alone)? 10,344 What is the number-needed-to harm (NNH) for t-PA How many died? 651 instead of streptokinase for intracranial bleeds? NNT = 1÷ Absolute Risk Reduction (ARR) Death as a % 6.3% Number of patients tx with t-PA instead of streptokinase How many got streptokinase? 20,173 to achieve one positive outcome The smaller the number the better How many died? 1473 NNH = 1 ÷ Absolute Risk Increase [Attributable Risk] Death as a % 7.3% Number tx that results in one negative outcome
The Gusto Investigators. NEJM 1993;329:673. The Gusto Investigators. NEJM 1993;329:673.
GUSTO GUSTO
Hemorrhagic Stroke
Relative Risk 0.86 104/20,023 = 0.52% for streptokinase
Relative Risk Reduction 13.6% * 0.72% for t-PA
(CI) (5.9% to 21.3%) Attributable risk = 0.72% - 0.52% = 0.2% Absolute Risk Reduction 1% NNH = 1÷ AR = 1 ÷ 0.002 = 500 (CI) (0.4% to 1.7%)† NNT 100 (250 to 59)
*Based on CI given in article, not calculated 14% †Estimated based on CI of RRR
The Gusto Investigators. NEJM 1993;329:673. The Gusto Investigators. NEJM 1993;329:673.
GUSTO Common Statistical Tests
Balancing Benefits and Risks It is beyond the scope of a 1-hour statistics overview For every 1000 patients treated with t-PA instead to review all of the most common statistical tests
of streptokinase You can email me for a handout that will help you 10 more patients will survive practice determining whether the appropriate tests 2 more patients will have a hemorrhagic stroke are being used in the studies you read
Using the wrong test used to be common…it is not common, but does occur rarely
The Gusto Investigators. NEJM 1993;329:673.
9 7/25/2015
The Appropriate Test Chi-square
Study design Determine the level of measurement for the Parallel [independent] or cross-over [related] dependent variable Number of groups being compared Nominal
Levels of measurement Determine the number of groups being compared
“Confounders” Two or more independent variables Two or more outcomes Assumptions of the test Contingency table method Use a test with different assumptions Determine whether the groups are independent or related Independent = Chi-square; Related = McNemar’s
Chi-square Chi-square
Determine whether the data meet the assumptions Assumptions of the test Independent observations For a 2x2 table, consider using the Yate’s correction Expected cell frequencies (ECF) = RT * CT/GT Most common No ECF < 1 Considered conservative No more than 20% has an ECF <5 More difficult to find a 1 cell < 5 in a 2x2 table statistically significant difference If not met… Type II error Descriptive stats Collapse categories Fisher’s Exact Test
Chi-square Chi-square
No Row No Row Bleeding Bleeding Totals Bleeding Bleeding Totals Individualized 1 49 50 Individualized 1 49 50 Enoxaparin Enoxaparin (4.9) (45.1) (4.9) (45.1) Conventional 9 44 53 Conventional 9 44 53 Enoxaparin Enoxaparin (5.1) (47.9) (5.1) (47.9) Column 10 93 103 Column 10 93 103 Totals Totals Chi-Square p = 0.03 Fisher’s Exact p = 0.02
Barras MA. Clin Pharmacol Ther 2008;83:882-8. Barras MA. Clin Pharmacol Ther 2008;83:882-8.
10 7/25/2015
Chi-square Chi-square
No No Relapsing Relapsing Row Relapsing Relapsing Row Spasm Spasm Totals Spasm Spasm Totals Vigabatrin 0 16 16 Vigabatrin 0 16 16 (2) (14) (2) (14) ACTH 4 12 16 ACTH 4 12 16 (2) (14) (2) (14) Column 42832 Column 42832 Totals Totals Chi Square p = 0.03 Fisher’s Exact = 0.10 Cossette P. Neurology 1999;52:1691-4. Cossette P. Neurology 2000;54:539. [Erratum]
Fisher’s Exact Test Multivariate Statistics
Nominal data
Assumptions Independent data Exposure Outcome Two groups and two outcomes [only] More than 2x2? (DRUG) Split into several 2x2 tables Increases Type I error Other explanatory Freeman-Halton extension variables that could affect outcome Useful when a nominal outcome is rare Or for small samples
Multivariate Statistics Logistic Regression
Detailed discussion beyond the scope of this Odds Ratio (OR): controlling for other variables in presentation the model, the odds of having the outcome of
Control for confounders interest Mantel-Haenszel Chi Square Over-estimates the relative risk Unless the outcome is rare Multi-way (eg, 2-way) ANOVA Analysis of Covariance (ANCOVA) Crude OR: only look at exposure and outcome with Multiple linear regression no adjustment Multiple logistic regression Adjusted OR: adjusts for other extraneous variables Cox proportional hazard models Confounders
11 7/25/2015
Survival Analysis Survival Analysis
Time to event analysis Death 1.0 Clinical progression
Kaplan-Meier Plot
Log-Rank Test
Cox Proportional Hazard Model Cumulative Survival Hazard Ratio (HR) 0 Time
Kaplan-Meier Plot Summary
640/850 = 75.5% Understanding commonly used biostatistical 581/840 = 69.2% concepts enables you to interpret the literature
ARR = 6.1% Descriptive statistics represent the sample, and hopefully, the population of interest
Confidence intervals can be used to make inferences about comparative treatments
“Incorrect” statistical tests in published papers are rare
Interpreting the results of these tests is more important and identifies literature deficiencies Bernard, et al. N Engl J Med 2001;344(10):699
Questions? [email protected]
12