<<

Lecture Overview Considerations for the choice of statistical Overview of tests Common Statistical Tests – Study design and hypothesis, type of and their distributions – Brief review of important statistical features Describe basic concepts for common October 11, 2019 statistical tests Ninet Sinaii, PhD, MPH – Chi-square, paired and two-sample t-tests, & Clinical Service ANOVA, correlations, simple and multiple NIH Clinical Center regression, , and some non- [email protected] parametric tests

Please Note Our Purpose ;) This lecture covers general concepts Sample vs Target Population behind common statistical procedures to Infer Truth in the Truth in the Infer Findings help you better: Universe Study in the Study

– understand and prepare your data Research Study Plan Actual Study – interpret your results and findings Question – design and prepare your studies Target Intended Actual population Design sample subjects – make sense of results in published literature Implement Phenomena Intended Actual of interest variables data However, consult a : Protocol Action – During the planning/design stage of a study Goal is simple: make the strongest possible conclusions from limited – For data and interpretation amounts of data. help us extrapolate from a set of data (sample) to make more general conclusions (population)

Sources: SG Hilsenbeck, Baylor College of Medicine; Motulsky “Intuitive Biostatistics”

Sample vs Target Population Hypothesis We want to know about this We have to work with this Study Paradigm Random Design Study Selection Compute test Sample (representative) Conduct Study Collect Data Compare to null Population Inference Examine Data Compute p-value, CI

Parameter Statistic DECIDE Interpret µ x p-value, CI (Population ) (Sample mean)

Source: SG Hilsenbeck, Baylor College of Medicine

N. Sinaii (BCES/CC/NIH) 1 What do medical studies look for? Measure of Association

Does a specific factor increase or decrease Quantity that expresses the relationship the likelihood of an outcome? between two (or more) variables – Interested in the etiology (cause or causes) of a specific outcome – Strength: strong, weak, none – However, it is the association between the How much association is there? exposure and the outcome that is assessed or the degree of association Depends on magnitude and sample size To understand – Direction: positive, negative, none To compare To predict To intervene

Measure of Association Lecture Overview

Commonly used measures of association include: – Differences between , proportions, rates What contributes to choice of statistical test? – RR or OR – Study design and hypothesis – – Type of data and their distributions – Regression coefficient …and some reviews – And many others

Choice depend on the study design and hypothesis, and type of data, and their distributions and assumptions!

Types of Studies Hypothesis Testing Process of translating a research question Hypothesis Generating Hypothesis Testing to a hypothesis that can be tested

Descriptive Analytic Involves conducting a test of and quantifying the degree that variability may account for the results observed in the study (so we can make a Case reports Observational Experimental conclusion about the population based on Case series Cohort Clinical Trials our study sample) Ecological Case-control Cross-sectional

N. Sinaii (BCES/CC/NIH) 2 Hypothesis Testing Step 1: Requires making an explicit statement of the hypothesis to be tested Statistical hypothesis testing automates Null hypothesis (H0): decision making – the primary goal of there is no association between IV and DV analyzing data eg, IE = IĒ or rp = 0 or µx = µy

– Examine the evidence provided by the data Alternative (Ha): there is an association between IV and DV eg, IE ≠ IĒ or rp ≠ 0 or µx ≠ µy (one-sided/directional: IE >IĒ or rp >0 or µx > µy)

Question: Is there enough evidence to reject H0? We expect (hope) to reject H0 in favor of Ha.

Step 2: Once H0 and Ha specified, test of P-values statistical significance can be performed Choice of test depends on the hypothesis How do we use p-values in relation to our and type of data  construct a hypothesis? from our data – if p-value  α (alpha), reject H0 and Tests lead to a probability statement or conclude statistical compatibility p-value – if p-value > α, cannot reject H0 and – P-value = the probability of obtaining a conclude NO statistical compatibility result as extreme or more extreme than the one observed, if H is actually true - Commonly used α values: 0.05, 0.01, or 0 even 0.1 depending on study purpose

P-values: Cautions P-values: Cautions

if p-value  α (alpha), reject H0 and if p-value > α, cannot reject H0 and conclude statistical compatibility conclude NO statistical compatibility – means results are surprising and would not Important notes: commonly occur if H were true 0 – High p-value does not prove the H0 – means the number (p-value) we calculated – Deciding not to reject the H0 is not the same from data is smaller than a threshold we had as believing that the H0 is definitely true: previously set  that’s it! absence of evidence is not evidence of absence – “Not statistically compatible” does not mean “no difference”!

N. Sinaii (BCES/CC/NIH) 3 P-values There is More… Important reminders: P-values must be interpreted in context – P-values do not measure the probability that the – How firm are the data? Based on many study hypothesis is true studies? – Decisions should not be only on whether a p-value How many p-values were calculated? passes a specific threshold – Collected multiple times per subject? – A p-value or statistical “significance” does not measure the size of an effect or importance of a Compared many times? result How large is the discrepancy or – By itself, a p-value is not sufficient evidence difference? regarding a study, methodology, or hypothesis – Is it clinically relevant? – Statistical “significance” does not mean clinical importance What is the sample size?

Sample Size and P-value Statistical Hypothesis Testing & Confidence Intervals As sample size increases, so does the power of the significance test Hypothesis testing computes a – Larger sample sizes narrow the distribution of (95% sure if α=0.05) that would contain the test statistic (hypothesized and true experimental results if H0 is true (so any hypothesis become more distinct from one result in this range is not statistically another) compatible, outside of it is) – Is the observed difference meaningful? Confidence intervals compute a range (eg, P-values are not enough to describe a 95% sure) that contains the population result! value Based on same statistical – Must always also assess the size of the theory and assumptions observed difference (effect size)

Source: Motulsky “Intuitive Biostatistics”

Statistical Hypothesis Testing & Statistical Hypothesis Testing & Confidence Intervals Confidence Intervals

If 95% CI does not contain the value of the H 0 Observed H  statistically compatible (p<0.05) 0 Zone of p>0.05 (not statistically compatible)

If 95% CI contains value of the H0 95% CI  not statistically compatible (p>0.05)

(values of the outcome) What is the conclusion based on this?

Source: Motulsky “Intuitive Biostatistics” Source: Motulsky “Intuitive Biostatistics”

N. Sinaii (BCES/CC/NIH) 4 Statistical Hypothesis Testing & Thought : Confidence Intervals Catching the real response rate Suppose the real response rate for a new

Observed H0 therapy is 0.3 (30%) Zone of p>0.05 Suppose we run a small safety and efficacy (not statistically comptabile) , and calculate the response rate and a 95% for the 95% CI response rate… over and over and over

(values of the outcome) How often will the interval capture the real value? What is the conclusion based on this?

Source: Motulsky “Intuitive Biostatistics” Source: SG Hilsenbeck, Baylor College of Medicine

True Rate=0.3, n=30, Confidence=95% True Rate=0.3, n=30, Confidence=99.9%

1.0 1.0 95% of CI's contain True Rate 100% of CI's contain True Rate

0.8 0.8

0.6 0.6 Rate 0.4 0.4

Observed Rate 0.2 0.2

0.0 0.0 0 20406080100 0 20406080100 Trials Trials

Source: SG Hilsenbeck, Baylor College of Medicine Source: SG Hilsenbeck, Baylor College of Medicine

True Rate=0.3, n=120, Confidence=95% Combining Magnitude & Strength of Association 1.0 95% of CI's contain True Rate 0.8 CIs combine magnitude and strength

0.6 Where applicable, reporting results with Rate 0.4 95% CI preferable than p-values alone because they reveal strength, direction, and 0.2 plausible range of an effect 0.0 0 20406080100 Trials

Source: SG Hilsenbeck, Baylor College of Medicine

N. Sinaii (BCES/CC/NIH) 5 Lecture Overview Types of Data Summary Scale Examples Statistics

What contributes to choice of statistical test? age VAS Continuous HDL BMI mean, , – Study design and hypothesis (continuum, scale) anxiety score SD, etc – Type of data and their distributions Nominal gender race count …and some reviews (Binary or 2+ group response & percentage, categories, no response rate ordering, discrete) treatment

Ordinal stage duration frequency count (2+ categories, clear severity VAS & percentage, ordering, discrete) performance median

Examples of Types of Reported Data Distribution of Data

Approximately normal (parametric)

90 78.7 77.0 76.7 76.1 74.1 80 72.1 69.0 68.8 Not normal 70 60 50 (non-parametric) 40 31.0 31.2 27.9 25.9 Percent 30 21.3 23.0 23.2 23.9 20 10 0 2009 2010 2011 2012 2013 2014 2015 Years Female Male 349

Examples of Data Distributions Normal (Gaussian) Distribution Has Two Parameters

Location = μ

60 80 100 120 140

Spread = σ

Source: SG Hilsenbeck, Baylor College of Medicine

N. Sinaii (BCES/CC/NIH) 6 Features to Summarize Summarizing : Middle Mean Measures of location (in a sample): – = average (sum of all observations / number of observations) – Median = middle value – = most frequently occurring value among all observations Median – = average of log transformed Geometric Mean values, then taking the antilog (positive integers only) 349 Source: SG Hilsenbeck, Baylor College of Medicine

Features to Summarize Summarizing Descriptive Statistics Summary Statistics: Spread

Measures of spread (in a sample): – = most common way to quantify variation – = square of standard deviation – Quantiles = (quartiles, quintiles, th deciles, etc; 50 = median) SD – Range = minimum and maximum values IQR Range

349 Source: SG Hilsenbeck, Baylor College of Medicine

Sample Data Other Probability Distributions Approximately Normally Distributed

Binomial 95.4% Geometric 68.2% 34.1% Poisson Exponential 13.6% Gamma Chi-square t, F

2.3% Lognormal (take log, get normal)

Normal 60 80 100 120 140

Source: SG Hilsenbeck, Baylor College of Medicine Source: SG Hilsenbeck, Baylor College of Medicine

N. Sinaii (BCES/CC/NIH) 7 Degrees of Freedom Parametric vs Non-parametric Parametric tests have Non-parametric tests do certain conditions not have conditions Simply, they are the number of about the population about the population independent pieces of that go parameters from which parameters: into calculating an estimate the sample is drawn: – No stringent – “the number of values that are free to vary” – Independent assumptions about observations parameters (distribution- – Not the same as number of items [and vary by free) test, eg, df=n–1, or (N +N )–2] – Drawn from normally 1 2 distributed populations – Apply to data in an – Populations have the ordinal or nominal scale; same continuous data changed to orders/ranks/signs – Continuous data, focus on mean difference – Focus on difference between

Lecture Overview Scenario

Describe basic concepts for common Study compared the proportion of new statistical tests: breast cancers in Tamoxifen treated and – Chi-square placebo treated women over 5 years. – Paired and two-sample t-tests – What is the outcome (dependent variable)? –ANOVA – What is the independent variable? – Correlations – What types of data are these? – Simple and multiple regression – Logistic regression (includes non-parametric tests)

Classification of Common Tests Chi-Square (2) test General Guide Dependent (Outcome) Variable Statistical test commonly applied to sets of Continuous categorical data Independent (not ~normal), Continuous Variable Dichotomous Nominal (>2) or ordinal (>2) (~normal) –Tests H0 that of Wilcoxon Dichotomous Chi-square Chi-square t-test observed events is consistent with a certain rank sum theoretical distribution (expected); tests Kruskal- Nominal (>2) Chi-square Chi-square ANOVA Wallis whether unpaired observations are Continuous Spearman Spearman independent of each other Wilcoxon Kruskal- (not ~normal), rank rank rank sum Wallis Common method for 2x2 (rxc) table or ordinal (>2) correlation correlation (Multinomial) Spearman Correlation, – Caveat: small numbers (expected <5 per cell), Continuous Logistic Logistic rank Linear (~normal) regression better to use Fisher’s instead regression correlation regression

Adapted and modified from Hulley and Cummings, 1988

N. Sinaii (BCES/CC/NIH) 8 2x2 Table for Categorical Results Chi-Square Example Compare the proportion of new breast cancers in Tamoxifen treated and placebo treated subjects over 5 years? Frequency| Dependent Variable Expected | Row % | Yes No Col % |BRCA |Dis Free| Total ------+------+------+ TAM | 16| 984| 1000 Yes | 25| 975| a b a + b | 1.6%| 98.4%| Independent | 32.0%| 50.5%| ------+------+------+ Variable Placebo | 34| 966| 1000 No c d c + d | 25| 975| | 3.4%| 96.6%| | 68.0%| 49.5%| ------+------+------+ a + c b + d N Total 50 1950 2000 Test Statistic DF Value P-value Chi-Square 1 6.65 0.01 Hypothetical data representative of Fisher et al, 1998, JNCI 90:1371-1388

Chi Square What if we double the sample? Compare the proportion of new breast cancers in Tamoxifen Observed Distribution treated and placebo treated subjects over 5 years? data very Frequency| close to Expected | Row % | expected Col % |BRCA |Dis Free| Total Observed ------+------+------+ data very TAM | 32| 1968| 2000 different | 50| 1950| | 1.6%| 98.4%| from | 32.0%| 50.5%| expected ------+------+------+ Placebo | 68| 1932| 2000 | 50| 1950| | 3.4%| 96.6%| | 68.0%| 49.5%| 3.84 ------+------+------+ P=0.05 Total 100 3900 4000 6.65 Test Statistic DF Value P-value P=0.01 Chi-Square 1 13.29 0.0003 Adapted from: SG Hilsenbeck, Baylor College of Medicine Hypothetical data representative of Fisher et al, 1998, JNCI 90:1371-1388

Chi Square Chi-Square Example Observed Distribution data very Smaller sample: close to – BRCA in 1.6% of those on TAM expected Observed – 95% CI: 0.8%, 2.4% data very Larger sample: different from – BRCA in 1.6% of those on TAM expected – 95% CI: 1.1%, 2.1%

3.84 What is the H ? Are these results P=0.05 0 6.65 13.29 statistically compatible at α=0.05? P=0.01 P=0.0003

Adapted from: SG Hilsenbeck, Baylor College of Medicine

N. Sinaii (BCES/CC/NIH) 9 Chi-Square Example Chi Square

Frequency| Observed Distribution Expected | data very Row % | Col % |BRCA |Dis Free| Total close to ------+------+------+ expected TAM | 25| 975| 1000 Observed | 25| 975| | 2.5%| 97.5%| data very | 50.0%| 50.0%| 0.00 different ------+------+------+ Placebo | 25| 975| 1000 P=1.00 from | 25| 975| expected | 2.5%| 97.5%| | 50.0%| 50.0%| ------+------+------+ Total 50 1950 2000 3.84 Test Statistic DF Value P-value P=0.05 6.65 13.29 Chi-Square 1 0.00 1.00 P=0.01 P=0.0003

Adapted from: SG Hilsenbeck, Baylor College of Medicine

Chi-Square Example Classification of Common Tests General Guide Smaller sample: Dependent (Outcome) Variable Continuous – BRCA in 1.6% of those on TAM Independent (not ~normal), Continuous – 95% CI: 0.8%, 2.4% Variable Dichotomous Nominal (>2) or ordinal (>2) (~normal) Wilcoxon Dichotomous Chi-square Chi-square t-test Larger sample: rank sum Kruskal- Nominal (>2) Chi-square Chi-square ANOVA – BRCA in 1.6% of those on TAM Wallis Continuous Spearman Spearman – 95% CI: 1.1%, 2.1% Wilcoxon Kruskal- (not ~normal), rank rank rank sum Wallis or ordinal (>2) correlation correlation (Multinomial) Spearman Correlation, Continuous Logistic What is the H0? Are these results Logistic rank Linear (~normal) regression statistically compatible at α=0.05? regression correlation regression *Fisher’s exact test is used anywhere chi-square used when expected cell counts <5 Adapted and modified from Hulley and Cummings, 1988

Paired Nominal Data 2x2 Table for Paired Results Paired: Test 2 – same subjects, different measures or intervals, pre-post, etc Positive Negative – matched pairs Use McNemar’s test for paired data: Positive a b a + b Test 1 –Tests H0 of “marginal ” – that the two marginal probabilities for each outcome Negative c d c + d are the same (eg, no treatment effect) – Tests change in a + c b + d N Is a non-parametric test Has a chi-square distribution H0: pb = pc versus Ha: pb ≠ pc For 2x2 table only

N. Sinaii (BCES/CC/NIH) 10 Scenario Classification of Common Tests General Guide

Dependent (Outcome) Variable Study compared golf scores for males and Continuous females in a PE class. Independent (not ~normal), Continuous Variable Dichotomous Nominal (>2) or ordinal (>2) (~normal) – What is the outcome (dependent variable)? Wilcoxon Dichotomous Chi-square Chi-square t-test – What is the independent variable? rank sum Kruskal- Nominal (>2) Chi-square Chi-square ANOVA – What types of data are these? Wallis Continuous Spearman Spearman Wilcoxon Kruskal- (not ~normal), rank rank rank sum Wallis or ordinal (>2) correlation correlation (Multinomial) Spearman Correlation, Continuous Logistic Logistic rank Linear (~normal) regression regression correlation regression

Adapted and modified from Hulley and Cummings, 1988

Student’s t-test Student’s t-test

Used to test hypotheses about equality of Follows the t-distribution estimating the means: mean of normally distributed population – One-sample: tests if mean of study sample has a Assumptions: – Independent random specified value in the null hypothesis (H0: µx=0) – Two-sample: tests if means of two study samples samples from two approximately normally are equal (H0: µx=µy) – Paired: tests if difference between two responses distributed populations in the same subject (or matched-pairs) has a – Variances of the two

mean of 0 (H0: Δ=0) populations are equal Formulas vary by sample and variances

Two Sample t-test Example Two Sample t-test Example

Compares values from two different Gender Method Mean 95% CL Mean Std Dev 95% CL Std Dev groups (if groups are independent, f 76.8571 74.5036 79.2107 2.5448 1.6399 5.6039 and data are normally/lognormally m 82.7143 79.8036 85.6249 3.1472 2.0280 6.9303 distributed). Diff (1-2) Pooled -5.8571 -9.1902 -2.5241 2.8619 2.0522 4.7242 Example: Study compared golf scores Diff (1-2) Satterthwaite -5.8571 -9.2064 -2.5078 for males and females in a PE class. N = 7, equal in each group

H0: µf = µm vs Ha: µf ≠ µm

The TTEST Procedure Equality of Variances Variable: Score Method Num DF Den DF F Value Pr > F Gender N Mean Std Dev Std Err Minimum Maximum Folded F 6 6 1.53 0.6189

f 7 76.8571 2.5448 0.9619 73.0000 80.0000 Note: H0: variances are equal m 7 82.7143 3.1472 1.1895 78.0000 87.0000 Diff (1-2) -5.8571 2.8619 1.5298

Source: SAS Institute, Inc Source: SAS Institute, Inc

N. Sinaii (BCES/CC/NIH) 11 Two Sample t-test Example Two Sample t-test Example

Gender Method Mean 95% CL Mean Std Dev 95% CL Std Dev f 76.8571 74.5036 79.2107 2.5448 1.6399 5.6039 m 82.7143 79.8036 85.6249 3.1472 2.0280 6.9303 Diff (1-2) Pooled -5.8571 -9.1902 -2.5241 2.8619 2.0522 4.7242 Diff (1-2) Satterthwaite -5.8571 -9.2064 -2.5078

Method Variances DF t Value Pr > |t| Pooled Equal 12 -3.83 0.0024 Satterthwaite Unequal 11.496 -3.83 0.0026

Equality of Variances Method Num DF Den DF F Value Pr > F Folded F 6 6 1.53 0.6189

Note: H0: variances are equal

What is the H0? H0: µf = µm vs Ha: µf ≠ µm What about the 95% CIs? Are the results statistically compatible based on them? Source: SAS Institute, Inc

Student’s t-test One Sample t-test Example

Compares a sample mean to a given Used to test hypotheses about equality of value (not always 0). means: Example: Study tested if the mean length of a certain type of court case – One-sample: tests if mean of study sample has a was more than 80 days. N = 20 randomly selected cases specified value in the null hypothesis (H0: µx=0) H : mean=80d vs H : mean>80d – Two-sample: tests if means of two study samples 0 a The TTEST Procedure are equal (H0: µx=µy) Variable: time – Paired: tests if difference between two responses N Mean Std Dev Std Err Minimum Maximum in the same subject (or matched-pairs) has a 20 89.8500 19.1456 4.2811 43.0000 121.0 mean of 0 (H : Δ=0) Mean 90% CL Mean Std Dev 90% CL Std Dev 0 89.8500 84.1659 Infty 19.1456 15.2002 26.2374 DF t Value Pr > t Source: SAS Institute, Inc 19 2.30 0.0164 Ref: Huntsberger and Billingsley, 1989

Paired t-test Example Scenario Detects differences between means of paired observations. Natural pairing of data exists, and cannot assume two groups of data Study examined the effect of 6 levels of are independent. Deltas are assumed bacteria on the nitrogen content of red to be normally distributed. Example: Study aimed to determine clover plants. effect on SBP before and after stimulus. – What is the outcome (dependent variable)? N = 12; H0: Δ=0 vs H1: Δ≠0 The TTEST Procedure – What is the independent variable? Variable: SBPbefore-SBPafter N Mean Std Dev Std Err Minimum Maximum – What types of data are these? 12 -1.8333 5.8284 1.6825 -9.0000 8.0000 Mean 95% CL Mean Std Dev 95% CL Std Dev -1.8333 -5.5365 1.8698 5.8284 4.1288 9.8958

DF t Value Pr > |t| 11 -1.09 0.2992 Source: SAS Institute, Inc

N. Sinaii (BCES/CC/NIH) 12 Classification of Common Tests (ANOVA) General Guide

Dependent (Outcome) Variable Statistical test to determine if the means of Continuous several groups are equal (thus, Independent (not ~normal), Continuous Variable Dichotomous Nominal (>2) or ordinal (>2) (~normal) generalizes the t-test to 2+ groups) Wilcoxon Dichotomous Chi-square Chi-square t-test – More power, less type I error than multiple rank sum Kruskal- two-sample t-tests Nominal (>2) Chi-square Chi-square ANOVA Wallis – Many consideration terms and classes of Continuous Spearman Spearman Wilcoxon Kruskal- models (not ~normal), rank rank rank sum Wallis or ordinal (>2) correlation correlation Assumptions: independence of (Multinomial) Spearman Correlation, Continuous Logistic Logistic rank Linear observations, of (~normal) regression regression correlation regression residuals, equality of variances

Adapted and modified from Hulley and Cummings, 1988

ANOVA Example ANOVA Example Model as a whole accounts for a Considers one treatment factor with 2+ treatment levels. Goal is to test for significant amount differences among the means of the levels and to quantify these differences. Sum of Mean of variation in the Source DF Squares Square F Value Pr > F dependent variable Example: Study examined the Model 5 847.046667 169.409333 14.37 <.0001 effect of bacteria on the Error 24 282.928000 11.788667 nitrogen content of red clover Corrected Total 29 1129.97466 plants. Treatment factor is 7 bacteria strain (6 levels). Red -Square Coeff Var Root MSE Nitrogen Mean clover plants are inoculated 0.749616 17.26515 3.433463 19.88667 with the treatments. Nitrogen content is measured at the Mean Source DF Anova SS Square F Value Pr > F end of the study. Class Level Information Strain 5 847.0466667 169.4093333 14.37 <.0001 Ref: Erdman (1946); Steel Class Levels Values and Torrie (1980). Note: Some Strain 6 3DOK1 3DOK13 3DOK4 3DOK5 F-test is used for comparing the factors of total deviation. 3DOK7 COMPOS between the means H0: µ1 = µ2 = … = µk for the different Number of Observations Read 30 strains is different from zero Source: SAS Institute, Inc Number of Observations Used 30 Source: SAS Institute, Inc

ANOVA Example Analysis of Variance (ANOVA) Terms explained: – Balanced design: cells have similar number of observations – Blocked design: schedule for conduct of experiment restricted to blocks – Factor: independent variables etc Single vs Multiple: -Example: two-way ANOVA has factors x and y as the main effects and their (xy) Post-hoc -Example: Three-way ANOVA has factors x, y, z as the main tests effects and their interactions (xy, xz, yz, xyz) -Termed factorial when all combinations of levels of each factor What is the H ? 0 are involved. The F test suggests there are differences among the bacterial strains, but -Note: interactions are complicated and caution is advised! it does not reveal any information about the nature of the differences What about the 95% CI? Source: SAS Institute, Inc

N. Sinaii (BCES/CC/NIH) 13 Scenario ANOVA Example Pain relief Example: Study used two treatment (outcome) randomized blocks with two variable factors treatment factors occurring in a Study examined the effects of codeine and factorial structure (4 distinct acupuncture on post-op dental pain. treatment combinations). Examined the effects of codeine – What is the outcome (dependent variable)? and acupuncture on post-op dental pain. The 32 subjects – What is/are the independent variable(s)? were assigned to eight blocks of – What types of data are these? four subjects each, based on pain tolerance. Both treatment factors had two levels (codeine: capsule or sugar capsule; acupuncture: two active or two inactive points). Codeine Ref: Neter et al (1990) Treatment Placebo Acupuncture Active Point Source: SAS Institute, Inc Inactive Point

ANOVA Example ANOVA Example Randomized Complete Block With Two Factors

Dependent Variable: Rlief Class Level Information Class Levels Values Source DF Anova SS Mean Square F Value Pr > F PainLevel 8 1 2 3 4 5 6 7 8 PainLevel 7 5.59875000 0.79982143 55.30 <.0001 Codeine 21 2Model as a whole Codeine 1 2.31125000 2.31125000 159.79 <.0001 Acupuncture 21 2accounts for a Acupuncture 1 3.38000000 3.38000000 233.68 <.0001 significant amount Codeine*Acupuncture 1 0.04500000 0.04500000 3.11 0.0923 Number of Observations Read 32 of variation in the Number of Observations Used 32 dependent variable Sum of Mean What is the H ? Source DF Squares Square F Value Pr > F 0 Model 10 11.33500000 1.13350000 78.37 <.0001 Note: Error 21 0.30375000 0.01446429 F-test is used for comparing the factors of total deviation. Corrected Total 31 11.63875000 H0: µ1 = µ2 = … = µk R-Square Coeff Var Root MSE Relief Mean 0.973902 10.40152 0.120268 1.156250 What about 95% CIs Note: F-test is used for comparing the factors of total deviation. Source: SAS Institute, Inc H0: µ1 = µ2 = … = µk Source: SAS Institute, Inc

Scenario

FYI… Study aimed to determine how well a child’s height and weight are related. – What is the outcome (dependent variable)? – What is the independent variable? – What types of data are these?

Source: SAS Institute, Inc

N. Sinaii (BCES/CC/NIH) 14 Classification of Common Tests Correlation General Guide Used to test hypotheses about how two Dependent (Outcome) Variable continuous variables are related Continuous Independent (not ~normal), Continuous – How much of the variability in one is explained by Variable Dichotomous Nominal (>2) or ordinal (>2) (~normal) the other Wilcoxon Dichotomous Chi-square Chi-square t-test rank sum – Sample correlation coefficient (r) is estimated, Kruskal- Nominal (>2) Chi-square Chi-square ANOVA ranging between -1 and 1, that quantifies the Wallis direction and strength of the linear association Continuous Spearman Spearman Wilcoxon Kruskal- (not ~normal), rank rank – Assumes normality rank sum Wallis or ordinal (>2) correlation correlation – Pearson’s correlation coefficient between random (Multinomial) Spearman Correlation, Continuous Logistic Logistic rank Linear variables x and y: (~normal) regression regression correlation regression

Adapted and modified from Hulley and Cummings, 1988

Classification of Common Tests Non-parametric Correlation General Guide Tests: Dependent (Outcome) Variable – Spearman’s coefficient Continuous Independent (not ~normal), Continuous – Kendall tau rank correlation coefficient Variable Dichotomous Nominal (>2) or ordinal (>2) (~normal) Wilcoxon No linear relationship requirement Dichotomous Chi-square Chi-square t-test rank sum Measures how one variable changes as Kruskal- Nominal (>2) Chi-square Chi-square ANOVA Wallis the other changes Continuous Spearman Spearman Wilcoxon Kruskal- (not ~normal), rank rank – If one increases and so does the other, rank rank sum Wallis or ordinal (>2) correlation correlation is positive (Multinomial) Spearman Correlation, Continuous Logistic – If one increases, and the other decreases, Logistic rank Linear (~normal) regression regression correlation regression rank is negative Interpretation is essentially the same Adapted and modified from Hulley and Cummings, 1988

Correlation Correlation Example

N. Sinaii (BCES/CC/NIH) 15 Correlation Example Correlation Example

What is the H0? What about 95% CIs? r=0.85 r=0.23 p<0.05 p<0.05 CIs are very helpful! For example:

–rp=0.87, 95% CI: 0.76, 0.94 p<0.0001 r=0.06 r= -0.85 –r=0.08, 95% CI: -0.12, 0.27 p>0.05 p p<0.05 p=0.43

Correlation: Caution Correlation and Regression Correlation analysis is related to regression analysis (which assesses the relation between an outcome variable and one or more covariates)

Pictures are worth a thousand words ;)

Regression

A quantitative dependent variable y and Type depends on the number and types of one or more explanatory/independent dependent and independent variables variables x – Continuous DV: Simple, or multiple (2+ predictors/explanatory/independent variables) – Categorical DV: Logistic Assumptions: – Linearity – Constant variance – Independent errors – Lack of

N. Sinaii (BCES/CC/NIH) 16 Linear Regression Scenario

Other important features explained: Study aimed to determine how well a – Multiple regression is also known as child’s weight can be predicted if the multivariable linear regression child’s height is known. Note: Multivariate linear regression refers to models – What is the outcome (dependent variable)? where multiple correlated DVs are predicted – What is the independent variable? – What types of data are these? – Generalized linear models used for dependent variables that are bounded or discrete (eg, skewed, Poisson distribution/, )

Linear Regression Example Linear Regression Example

Analysis of Variance Example: Use regression Sum of Mean analysis to determine how Source DF Squares Square F Value Pr > F well a child’s weight can be Model 1 7193.24912 7193.24912 57.08 <.0001 predicted if child’s height is Error 17 2142.48772 126.02869 known. Corrected 18 9335.73684 Total N = 19 children; height and Root MSE 11.22625 R-Square 0.7705 weight were measured. Dependent Mean 100.02632 Adj R-Sq 0.7570 r2 Coeff Var 11.22330 77% of variability in Y explained by variability in X Equation of interest is: Parameter Estimates Weight = β0 + β1*height + ε Parameter Standard Variable DF Estimate Error t Value Pr > |t| Where weight is the response/dependent variable Intercept 1 -143.02692 32.27459 -4.43 0.0004 Height 1 3.89903 0.51609 7.55 <.0001 β are unknown parameters Height is the regression/independent variable From the parameter estimates, the fitted model is: Weight = -143.0 + 3.9*height ε is unknown error Source: SAS Institute, Inc Source: SAS Institute, Inc

Linear Regression Example Linear Regression Example

What is

the H0?

What about 95% CIs?

Source: SAS Institute, Inc

N. Sinaii (BCES/CC/NIH) 17 Scenario Classification of Common Tests General Guide

Dependent (Outcome) Variable Study investigated the role of confusion Continuous and use of certain medications to predict Independent (not ~normal), Continuous Variable Dichotomous Nominal (>2) or ordinal (>2) (~normal) falls in the elderly. Wilcoxon Dichotomous Chi-square Chi-square t-test – What is the outcome (dependent variable)? rank sum Kruskal- Nominal (>2) Chi-square Chi-square ANOVA – What is/are the independent variable(s)? Wallis Continuous Spearman Spearman Wilcoxon Kruskal- – What types of data are these? (not ~normal), rank rank rank sum Wallis or ordinal (>2) correlation correlation (Multinomial) Spearman Correlation, Continuous Logistic Logistic rank Linear (~normal) regression regression correlation regression

Adapted and modified from Hulley and Cummings, 1988

Logistic Regression Classification of Common Tests General Guide

Regression model where the dependent Dependent (Outcome) Variable variable is categorical (outcome achieved Continuous Independent (not ~normal), Continuous vs not: treatment successful vs failed; alive Variable Dichotomous Nominal (>2) or ordinal (>2) (~normal) Wilcoxon vs dead; recovered vs not recovered) Dichotomous Chi-square Chi-square t-test rank sum – Multinomial logistic regression (more than two Chi-square, Kruskal- Nominal (>2) Chi-square Logistic ANOVA outcome variables, or ordinal (ordered) Wallis Regression outcomes Continuous Spearman Spearman Wilcoxon Kruskal- (not ~normal), rank rank – Conditional logistic regression (matched rank sum Wallis studies) or ordinal (>2) correlation correlation (Multinomial) Spearman Correlation, Continuous Logistic Logistic rank Linear – 1 or more predictor variables (mixed types) (~normal) regression regression correlation regression

Adapted and modified from Hulley and Cummings, 1988

Logistic Regression Logistic Regression

Used to predict Assumptions violated whether a subject (residuals not normally 1.0 1.0 distributed) has a given outcome 0.8 0.8 Logit transformation: (ie, disease), based – Takes odds of event 0.6 on observed 0.6 happening at different characteristics of the levels of each IV 0.4 0.4 – Then takes ratio of those subject (eg, age, odds gender, clinical 0.2 0.2 – Then takes the log of those characteristics, etc) ratios P=Probability of falls, given x 0.0 Assume linear P=Probability of falls, given x 0.0 Creates continuous criterion as a transformed 0246810 regression 0246810 x x version of the DV

Source: SG Hilsenbeck, Baylor College of Medicine Source: SG Hilsenbeck, Baylor College of Medicine

N. Sinaii (BCES/CC/NIH) 18 Logistic Regression Logistic Regression Example Logit of outcome (success) then fitted to predictors using linear regression analysis

Study investigating: 4 – outcome of falls (yes vs no)

2 – using predictors of confusion and use of certain medications (and other considerations) 0 Note: Examples of univariable results are shown here Log(Odds) -2

-4

0246810 X

Source: SG Hilsenbeck, Baylor College of Medicine

Chi Square Test for Independence Logistic Regression Obs Freq | Exp | Diff | Group Row Pct | Col Pct |Cases |Controls| Total ------+------+------+ No | 96 | 170 | 266 | 114 | 152 | Confusion | 27-18 | 48 18 | 76.00 ------+------+------+ Yes | 54 | 30 | 84 | 36 | 48 | | 15.18 | 8-18 | 24.00 ------+------+------+ Total 150 200 350 42.86 57.14 100.00 Statistic DF Value Prob ------Chi-Square 1 20.7237 <.0001 Source: SG Hilsenbeck, Baylor College of Medicine Source: SG Hilsenbeck, Baylor College of Medicine

Confusion Group Model Fitting Information and Testing Global Null Hypothesis BETA=0 Chi Square Test for Intercept Obs Freq | Intercept and Exp | Criterion Only Covariates Chi-Square for Covariates Independence AIC 480.036 461.389 . Diff | SC 483.894 469.105 . Row Pct | -2 LOG L 478.036 457.389 20.647 with 1 DF (p=0.0001) Col Pct |Cases |Controls| Total Score . . 20.724 with 1 DF (p=0.0001) ------+------+------+ OR = 3.19 Analysis of Maximum Likelihood Estimates No | 96 | 170 | 266 95% CI: 1.91, 5.32 Parameter Standard Wald Pr > Standardized | 114 | 152 | Variable DF Estimate Error Chi-Square Chi-Square Estimate INTERCPT 1 -0.5715 0.1277 20.0353 0.0001 . | 27-18 | 48 18 | 76.00 CONFUSN 1 1.1592 0.2611 19.7185 0.0001 0.273348 | 36.09 | 63.91 | | 64.00 | 85.00 | Is this consistent with Conditional Odds Ratios and 95% Confidence Intervals ------+------+------+ chi-square test results? Wald Yes | 54 | 30 | 84 Confidence Limits | 36 | 48 | Is it statistically Odds | 15.18 | 8-18 | 24.00 Variable Unit Ratio Lower Upper compatible at α=0.05? CONFUSN 1.0000 3.187 1.911 5.317 | 64.29 | 35.71 | | 36.00 | 15.00 | ------+------+------+ Total 150 200 350 What is the H0? 42.86 57.14 100.00 Statistic DF Value Prob What about 95% CI? ------Chi-Square 1 20.7237 <.0001

Source: SG Hilsenbeck, Baylor College of Medicine Source: SG Hilsenbeck, Baylor College of Medicine

N. Sinaii (BCES/CC/NIH) 19 Classification of Common Tests General Guide Other Non-Parametric Tests

Dependent (Outcome) Variable Wilcoxon rank-sum tests (same as Mann- Continuous Whitney U test) two-sample t-test Independent (not ~normal), Continuous Variable Dichotomous Nominal (>2) or ordinal (>2) (~normal) Wilcoxon signed-rank test  paired t-test Wilcoxon Dichotomous Chi-square Chi-square t-test rank sum Kruskal-Wallis  ANOVA (or singly- Kruskal- Nominal (>2) Chi-square Chi-square ANOVA Wallis ordered ) Continuous Spearman Spearman – Jonckheere-Terpstra for doubly-ordered data Wilcoxon Kruskal- (not ~normal), rank rank rank sum Wallis or ordinal (>2) correlation correlation Spearman’s correlation/Kendall’s tau  (Multinomial) Spearman Correlation, Continuous Logistic Pearson’s correlation Logistic rank Linear (~normal) regression regression correlation regression

Adapted and modified from Hulley and Cummings, 1988

Classification of Common Tests Summary General Guide

Dependent (Outcome) Variable Continuous General concepts were covered in this Independent (not ~normal), Continuous lecture to help you understand Variable Dichotomous Nominal (>2) or ordinal (>2) (~normal) Wilcoxon your/published data and results a little Dichotomous Chi-square Chi-square t-test rank sum better, and help interpret findings. Kruskal- Nominal (>2) Chi-square Chi-square ANOVA Wallis Continuous Spearman Spearman Wilcoxon Kruskal- However, consult a statistician: (not ~normal), rank rank rank sum Wallis or ordinal (>2) correlation correlation – During the planning/design stage of a study Spearman Correlation, Continuous Logistic Logistic – For and interpretation rank Linear (~normal) regression regression correlation regression

Adapted and modified from Hulley and Cummings, 1988

Take Home Message  Questions?  “Focusing on whether or not a result is statistically compatible can get in the way of good science”

When reading the scientific literature, don’t let a conclusion about statistical significance stop you from trying to understand what the data really show: the point of statistics is to quantify scientific evidence and uncertainty

Source: Motulsky “Intuitive Biostatistics”

N. Sinaii (BCES/CC/NIH) 20