Disclosure Objectives Internal Validity Issues with Internal Validity Random Error

7/25/2015

49th Annual Meeting Disclosure

 I do not have a vested interest in or affiliation with any corporate organization offering financial Applications of Biostatistics support or grant monies for this continuing education activity, or any affiliation with an organization whose philosophy could potentially bias my Randy C. Hatton, BPharm, PharmD, FCCP, BCPS presentation Clinical Professor University of Florida, College of Pharmacy

OWNING CHANGE: Taking Charge of Your Profession

Objectives Internal Validity

 Apply commonly used biostatistical concepts to Does the study measure what is supposed to? review literature   Summarize data using appropriate forms of Is the outcome due to a random error biostatistical review [chance]?

 Provide practical approaches for simplifying  Is the outcome due to a systematic error biostatistics in daily practice [other cause or bias]?

Issues with Internal Validity Random Error

 Random error  All 100 pneumonia patients who received penicillin

 Systematic error recovered, while 10 of the 100 pneumonia patients  Patient who received placebo died  Selection bias  What is the chance that the difference in mortality did  Attrition bias not occur by chance but rather due to the penicillin  Confounding exposure?  Measurement  Choice and timing of measures  Measurement bias  Reliability and validity of measurement

1 7/25/2015

Theory of Science Control for Random Error

 Study the universe of patients Sample Population  Rarely done…  US Census

 Study a sample of the population and estimate the Sample chance that the conclusions are “wrong”

Statistical Inference Random Error & Sample Size

 The outcome happened by chance because the inference was made based on a sample  If the universe of patients were tested no random error could occur Penicillin No Penicillin True mortality 0.1% 10% N=10 0/10 1/10 N=100 0/100 10/100 N=1000 1/1000 100/1000

Types of Variables Types of Variables

 Independent Variable (Exposure or Treatment)  Dependent Variable (Measurement or Outcome)  Control variable  Response variable  Predictor variable  Experimental variable  Grouping variable  Primary Outcome [Dependent] Variable  Manipulated variable  Only one  Treatments  Secondary Outcomes  Conditions  Planned  Factors  Subset Analyses  Planned or unplanned  Post-hoc analyses  Unplanned

2 7/25/2015

Levels of Measurement Levels of Measurement

 Nominal  Ordinal  Data that can be classified into mutually exclusive  Data that can be ranked without a consistent level of groups without any order or ranking of severity magnitude between ranks  Binary means two nominal outcomes  Number assigned to ranking  Examples  Examples  Demographics   Gender Likert scale  Race  Pain scale  Risk factors  Quality of life scores  Outcomes or endpoints  Lived OR died (binary)  Response OR no response

Levels of Measurement Levels of Measurement

 Continuous  Data that are measured on a continuum with a consistent level of magnitude between units  Sometimes only measured in whole numbers  Number of seizures, length-of-stay, heart rate  Interval: arbitrary zero  Degrees centigrade  Ratio: absolute zero  BP, body weight, half-life, drug concentration

Keegan S. Ann Pharmacother 2009;43:19-27.

Levels of Measurement Levels of Measurement

 If you don’t understand a measurement, you can’t  Nominal < Ordinal < Continuous possibly determine statistical or clinical significance  You can use a statistical test designed for a lower level  APACHE II Score? of measure of a dependent variable, but [in general]  SOFA Score? not the other way around  Somewhat controversial  Child-Pugh Score?  Using parametric methods to analyze pain measurements, that  aPTT are generally ordinal variables [nonparametric]  Continuous variables are often surrogate outcomes  Thus, are lower levels of measurement clinically  The mathematical and clinical hierarchy of levels of measurement do not match

3 7/25/2015

Valid Measurements Types of Variables

 Valid Outcomes  Extraneous Variables or “Confounders”  Appropriate for the question being asked  Confounders are differences between the study and  Measures what it is supposed to measure control groups due to chance or bias that could affect  Compared the measurement to the reference [gold] standard the dependent variables  MUST be reliable  Controlling confounders by design or analysis is  What do values or changes in values mean? important  How much change is clinically significant  Randomization  Stratified random allocation  Cross-over design  Mantel Haenszel Chi-square

Descriptive Statistics Descriptive Statistics

Measures of Central Tendencies Applicability of Measures of Central Tendency  Mean  Average Characteristic Mean Median Mode  Other means Used for continuous data Yes Yes Yes  Geometric mean Used for ordinal data No? Yes Yes  Median  Middle value Used for nominal data No No Yes

 Mode Affected by outliers Yes No No  Most common value

Gaddis & Gaddis. Ann Emerg Med 1990;19:309-15.

Descriptive Statistics Descriptive Statistics

Measures of Variability

 Range “Outer” fence = 3 IQR  Standard deviation  Standard Error of the Mean (SEM)  Measure of precision for the mean estimate  Not a measure of variability } Whisker = 1.5 IQR  Interquartile range (IQR) IQR  25th-75th percentile 25-75th  Box Plots percentile {

4 7/25/2015

Box Plots vs Frequency Histogram Descriptive Statistics

Applicability of Measures of Dispersion Characteristic Range IQR SD SEM ** * * * Useful for continuous data Yes Yes Yes Yes Useful for ordinal data Yes Yes No? No Describes sample variability Yes Yes Yes No Assists in statistical inferences No No Yes Yes Used to calculate CIs No No No Yes

Gaddis & Gaddis. Ann Emerg Med 1990;19:309-15.

How to Interpret CIs How to Interpret CIs

Confidence Intervals

 Used to describe a variable

 May be used instead of [or with] hypothesis testing  Differences: Zero in a CI  Ratios: One in a CI

 Range of “reasonable values” for the parameter of interest  90%, 95%, and 99% CIs

 Estimates the magnitude, direction, and certainty of a measurement Weaver SJ. J Am Pharm Assoc 2004;44:694-9.

Hypothesis Testing The Null Hypothesis (H0)

 Null Hypothesis Decision The Truth  A = B H True H False  Alternate Hypothesis O O  A B Reject  One-tailed vs two-tailed Type I error Correct  Difference in either direction (HO false)  Easier to show superiority with a one-tailed test Retain Correct Type II error (HO true)

5 7/25/2015

The Null Hypothesis (H0) Type I and II Errors

Decision The Truth  There cannot be Type I and Type II errors at the same time, for the same dependent variable Equal Different  Type I error is only a possible problem when there is statistical significance Groups are Correct Type II error  Type II error is only a possible problem when there Equal is no statistical significance  Small studies that find a difference have a sufficient Groups are sample size Different Type I error Correct  These studies are NOT under-powered

Sample Size (n=) Sample Size (n=)

 Alpha  Alpha, by convention, is usually 0.05

 Beta  But can be lower or higher  Power = 1- beta  Beta, by convention, is usually 0.2 or 0.1

 Effect size  Power = 80% or 90%  Clinically significant difference  Clinically significant differences are set by  Variability [continuous variables] convention and/or must be defensible

 Additional adjustments  If continuous variables are variable = larger  Attrition rate samples  Distribution  Attrition rates are based on other studies

Why Estimate Sample Size? Why Estimate Sample Size?

 A sample that is too small is under-powered  A sample that is too large can result in over-  A sample that is too small may suggest an important powering difference, but it would not be statistically significant  Statistically significant differences that are clinically  We could miss a potentially valuable intervention irrelevant  A sample that is too small may show no difference, but  Wastes resources [time and money] and put patients at we don’t know if there is really no difference or just did unnecessary risk not have the power to detect a difference  Too small a sample wastes resources [time and money] and may put patients at unnecessary risk

6 7/25/2015

Why Estimate Sample Size? P-Values

 …”Based on a literature review, the event rate for the  Calculated probability of making a Type I error first primary outcome variable was predicted to be 20% ……For the experimental therapy to be  Concluding there is a difference when there is not considered clinically beneficial, it was judged  Alpha: the pre-set acceptable Type I error necessary to lower the event rate to 5%.... A sample  < 0.05 most common size calculation indicated that 174 patients were needed to demonstrate a difference in .. [this]..  Higher alpha accepted when you do not want to miss an outcome … with a power of 80% and an overall effect [exploratory research = 0.10] alpha of 0.05….”  Lesser values required…  Good experimental studies have an a priori sample  Multiple testing size determination  Interim analyses  Observational studies often use the data available, and under-and over-powering may be issues  Statistical differences do not guarantee clinical Trachtman et al. JAMA Vol.290 No.10, 2003 importance

Differences Between Groups Random Error?

Tx (20% mortality) Control (40% mortality)  Comparison of aspirin (20%) and non-aspirin users RR = 0.5 (40%)  10 patients in each group: 2 MIs Versus 4  100 patients in each group: 20 MIs Versus 40 N (per group) 95% CI p =  1000 patients in each group: 200 MIs Versus 400 10 0.11 – 2.14 > 0.2 Random error becomes smaller … 100 0.32 –0.79 < 0.05 with a larger sample of patients 1000 0.43 – 0.58 < 0.001 with a larger baseline incidence with a larger difference between groups

Interpretation of P-Values Interpretation of P-Values

 P=0.2  If the differences between the variables being  Denotes the probability of a Type I error compared is not clinically significant, then the p- [conclude a difference when there is not] value is irrelevant  20% chance of making a Type I error  If the p-value is above 0.05?  If less than 0.05, you have statistical significance that is  A statistician would say that the p-value is the clinically irrelevant calculated probability that a test statistic would be as large [or as small] as observed

7 7/25/2015

How to Interpret P-values? Statistical Significance

 If the difference is clinically significant  Examine the sample size and number of events % Responding to  If p is small (e.g., p=0.001), it is very unlikely that the Treatment P- Value Statistical difference occurred by chance New Standard Significance  If the p is large (e.g., p>0.5), it is likely that the differences occurred by chance 480/800 = 60% 416/800 = 52% 0.001 Yes

 Do not obsess on a p<0.05, which is arbitrary 15/25 = 60% 13/25 = 52% 0.57 No  Remember, p-values do not address systematic error 15/25 = 60% 9/25 = 36% 0.09 No  Studies are usually not refuted after publication because they had a large random error 240/400 = 60% 144/400 = 36% <0.0001 Yes

Braitman LE. Ann Intern Med 1991;114:515-7.

Estimation Statistical vs Clinical Significance

% Responding to Treatment Difference in % Graph of 95% CI Responding (● = point estimate) Smallest Clinically Important New Standard Point 95% CI Difference Assumed to 15% • Estimate 480/800 = 60% 416/800 = 52% 8% 3% to 13%* •

15/25 = 60% 13/25 = 52% 8% -19% to 35% • • 15/25 = 60% 9/25 = 36% 24% -3 to 51% Zero Difference = NSD

240/400 = 60% 144/400 = 36% 24% 17% to 31%* Braitman LE. Ann Intern Med 1991;114:515-7.

Braitman LE. Ann Intern Med 1991;114:515-7. Braitman LE. Ann Intern Med 1991;114:515-7. Statistically significant = *

Relative vs Absolute Differences Relative vs Absolute Differences

Risk in exposed = Absolute Risk  Risk of an adverse event

Risk in unexposed = Absolute Risk Drug No Drug RR Increased risk Absolute increase 500/1000 300/1000 1.67 67% 200 more per 1000 100/1000 60/1000 1.67 67% 40 more per 1000 Absolute Risk (exposed) = Relative Risk (RR) 5/1000 3/1000 1.67 67% 2 more per 1000 Absolute Risk (unexposed)

With RR you lose the numerator and denominator

8 7/25/2015

Exercise: GUSTO Trial GUSTO

 Calculate the number-needed-to treat to save 1 life What is a unsucessful outcome? death  Using alteplase (t-PA) instead of streptokinase  Calculate the CI for this NNT How many got t-PA (alone)? 10,344  What is the number-needed-to harm (NNH) for t-PA How many died? 651 instead of streptokinase for intracranial bleeds?  NNT = 1÷ Absolute Risk Reduction (ARR) Death as a % 6.3%  Number of patients tx with t-PA instead of streptokinase How many got streptokinase? 20,173 to achieve one positive outcome  The smaller the number the better How many died? 1473  NNH = 1 ÷ Absolute Risk Increase [Attributable Risk] Death as a % 7.3%  Number tx that results in one negative outcome

The Gusto Investigators. NEJM 1993;329:673. The Gusto Investigators. NEJM 1993;329:673.

GUSTO GUSTO

Hemorrhagic Stroke

Relative Risk 0.86  104/20,023 = 0.52% for streptokinase

Relative Risk Reduction 13.6% *  0.72% for t-PA

(CI) (5.9% to 21.3%)  Attributable risk = 0.72% - 0.52% = 0.2% Absolute Risk Reduction 1%  NNH = 1÷ AR = 1 ÷ 0.002 = 500 (CI) (0.4% to 1.7%)† NNT 100 (250 to 59)

*Based on CI given in article, not calculated 14% †Estimated based on CI of RRR

The Gusto Investigators. NEJM 1993;329:673. The Gusto Investigators. NEJM 1993;329:673.

GUSTO Common Statistical Tests

Balancing Benefits and Risks  It is beyond the scope of a 1-hour statistics overview  For every 1000 patients treated with t-PA instead to review all of the most common statistical tests

of streptokinase  You can email me for a handout that will help you  10 more patients will survive practice determining whether the appropriate tests  2 more patients will have a hemorrhagic stroke are being used in the studies you read

 Using the wrong test used to be common…it is not common, but does occur rarely

The Gusto Investigators. NEJM 1993;329:673.

9 7/25/2015

The Appropriate Test Chi-square

 Study design  Determine the level of measurement for the  Parallel [independent] or cross-over [related] dependent variable  Number of groups being compared  Nominal

 Levels of measurement  Determine the number of groups being compared

 “Confounders”  Two or more independent variables  Two or more outcomes  Assumptions of the test  Contingency table method  Use a test with different assumptions  Determine whether the groups are independent or related  Independent = Chi-square; Related = McNemar’s

Chi-square Chi-square

 Determine whether the data meet the assumptions  Assumptions of the test  Independent observations  For a 2x2 table, consider using the Yate’s correction  Expected cell frequencies (ECF) = RT * CT/GT  Most common  No ECF < 1  Considered conservative  No more than 20% has an ECF <5  More difficult to find a  1 cell < 5 in a 2x2 table statistically significant difference  If not met…  Type II error  Descriptive stats  Collapse categories  Fisher’s Exact Test

Chi-square Chi-square

No Row No Row Bleeding Bleeding Totals Bleeding Bleeding Totals Individualized 1 49 50 Individualized 1 49 50 Enoxaparin Enoxaparin (4.9) (45.1) (4.9) (45.1) Conventional 9 44 53 Conventional 9 44 53 Enoxaparin Enoxaparin (5.1) (47.9) (5.1) (47.9) Column 10 93 103 Column 10 93 103 Totals Totals Chi-Square p = 0.03 Fisher’s Exact p = 0.02

Barras MA. Clin Pharmacol Ther 2008;83:882-8. Barras MA. Clin Pharmacol Ther 2008;83:882-8.

10 7/25/2015

Chi-square Chi-square

No No Relapsing Relapsing Row Relapsing Relapsing Row Spasm Spasm Totals Spasm Spasm Totals Vigabatrin 0 16 16 Vigabatrin 0 16 16 (2) (14) (2) (14) ACTH 4 12 16 ACTH 4 12 16 (2) (14) (2) (14) Column 42832 Column 42832 Totals Totals Chi Square p = 0.03 Fisher’s Exact = 0.10 Cossette P. Neurology 1999;52:1691-4. Cossette P. Neurology 2000;54:539. [Erratum]

Fisher’s Exact Test Multivariate Statistics

 Nominal data

 Assumptions  Independent data Exposure Outcome  Two groups and two outcomes [only]  More than 2x2? (DRUG)  Split into several 2x2 tables  Increases Type I error Other explanatory  Freeman-Halton extension variables that could affect outcome  Useful when a nominal outcome is rare  Or for small samples

Multivariate Statistics Logistic Regression

 Detailed discussion beyond the scope of this  Odds Ratio (OR): controlling for other variables in presentation the model, the odds of having the outcome of

 Control for confounders interest  Mantel-Haenszel Chi Square  Over-estimates the relative risk  Unless the outcome is rare  Multi-way (eg, 2-way) ANOVA  Analysis of Covariance (ANCOVA)  Crude OR: only look at exposure and outcome with  Multiple linear regression no adjustment  Multiple logistic regression  Adjusted OR: adjusts for other extraneous variables  Cox proportional hazard models  Confounders

11 7/25/2015

Survival Analysis Survival Analysis

 Time to event analysis  Death 1.0  Clinical progression

 Kaplan-Meier Plot

 Log-Rank Test

 Cox Proportional Hazard Model Cumulative Survival  Hazard Ratio (HR) 0 Time

Kaplan-Meier Plot Summary

640/850 = 75.5%  Understanding commonly used biostatistical 581/840 = 69.2% concepts enables you to interpret the literature

ARR = 6.1%  Descriptive statistics represent the sample, and hopefully, the population of interest

 Confidence intervals can be used to make inferences about comparative treatments

 “Incorrect” statistical tests in published papers are rare

 Interpreting the results of these tests is more important and identifies literature deficiencies Bernard, et al. N Engl J Med 2001;344(10):699

Questions? [email protected]