Basic Biostatistics Part 2

1st March, 2017 Content

• Part 1 Summary • Sampling • Statistical Hypothesis Tests • Errors in Hypothesis Tests • Power and Sample Size • Examples • Correlation and Regression Part 1 Summary

• What were the key learning points from Part 1?

− In groups, identify 3 key learning points from the first session Sampling Sampling

• An investigation of a population is said to be a survey or study of the population • A population is a group of individuals or objects that meets a set of pre-defined criteria; e.g. - All people with permanent residence in the UK - All patient records held in a database - All patients with schizophrenia - All staff members of an organisation - All patients registered to a particular specialist - All members of the population diagnosed with a particular health condition • A survey or study that collects information from every member of a population is referred to as a census Sampling

• Not always possible to collect information from every member of a population due to time and resources • A ‘good sample’ can be used to reliably estimate characteristics (e.g. the mean) of the population • Sample – any subset of a population

Sample

Population Sampling Error

• Errors in surveys can be divided into two categories • Sampling error - error due to taking a sample rather than studying the whole population - e.g. if a psychiatrist randomly selects a sample of patients and records the duration of each appointment, the average treatment time can be calculated - if the times for all patients were recorded (i.e. the entire population) then the population average would most likely differ from the sample average Non-sampling error

• Non-Sampling error is error due to: - poor selection of strata or sample (coverage errors) - poor data entry (processing errors) - inaccurate responses (measurement errors) - non-response errors • In surveys, non-sampling errors can be more of a problem than sampling errors Statistical Hypothesis Tests Hypothesis Testing

• A process called Hypothesis Testing is used to quantify a belief against a particular hypothesis about the population • There are many different types of hypothesis tests • Five stages for hypothesis testing can be defined: 5 Stages

1. Define the Null & Alternative Hypotheses 2. Collect data 3. Calculate the value of the test statistic 4. Compare the value of the test statistic to values from a known probability distribution 5. Interpret the P-value and results The Null Hypothesis

• The Null Hypothesis is tested which assumes no effect (e.g. the difference in means equals zero) in the population

• Example: Comparing the rates of hallucinations in men and woman in the population

− Null Hypothesis (H0): rates of hallucinations are the same in men and woman in the population The Alternative Hypothesis

• The Alternative Hypothesis is holds if the Null Hypothesis is not true

• Example

− Alternative Hypothesis (H1): rates of hallucinations are different in men and woman in the population The test statistic

• After data collection, the sample data is used to calculate a test statistic

• The test statistic is effectively the amount of evidence in the data against H0 • Generally, the larger the value (irrelevant of sign), the greater the evidence against H0 The P-value

• The test statistic is compared to values from the relevant probability distribution to obtain a P-value • The P-value is the probability of obtaining our results, or something more extreme, if the Null Hypothesis is true • The smaller the P-value, the greater the evidence against H0 Rejecting H0

• Conventionally, if the P-value < 0.05, there is sufficient evidence to reject H0

• There is only a small chance of the results occurring if H0 is true

– H0 is rejected, the results are statistically significant at the 5% level Not rejecting H0

• If the P-value ≥ 0.05, there is insufficient evidence to reject H0

– H0 is not rejected, the results are not statistically significant at the 5% level

• NB: This does not mean that the null hypothesis is true, simply that we do not have enough evidence to reject it! Parametric vs. Non-Parametric tests

• Tests which are based on the assumption that the data follows a known probability distribution (often the Normal) are known as parametric tests

• Sometimes data does not conform to the assumption so non-parametric tests can be used

• Non-Parametric tests make no assumption about the probability distribution Non-parametric tests

• Useful when:

− sample size is small − data is measured on a categorical scale (though they are used for numerical data as well)

• However:

− they have less power of detecting a real difference than the equivalent parametric tests − they lead to decisions rather than generating a true understanding of the data Statistical tests

• Numerical data (Parametric tests)

– One-sample t-test – Independent t-test – Paired t-test – One-way ANOVA Statistical tests

• Numerical data, (non-parametric tests)

– Sign test – Wilcoxon signed rank test – Wilcoxon rank sum test – Kruskal-Wallis test Statistical tests

• Categorical data

– z-test for a proportion – Sign test – McNemar’s test – Chi-squared test – Chi-squared trend test – Fisher’s exact test Choosing a statistical test

• Useful medical statistical books will contain a flowchart to provide guidance

• Considerations include:

– what is the data type? – how many groups of data are there? – can a probability distribution be assumed? Errors in Hypothesis Testing Making a wrong decision

• There is the possibility of making a wrong decision when conducting a Hypothesis test

• A wrong decision may be made when rejecting or not rejecting the Null Hypothesis

• The possible mistakes that can be made are a: – Type I error – Type II error Type I error

• Rejecting the Null Hypothesis when it is true

• Concluding that there is an effect (difference) when in reality there is none

• The maximum chance of making a Type I error is denoted by alpha α

• α is the significance level of the test, we reject the null hypothesis if the p-value is less than the significance level Type II error

• Not rejecting the Null Hypothesis when it is false

• Concluding that there is no effect (difference) when one really exists

• The chance of making a Type II error is denoted by beta β

• Its compliment 1- β, is the Power of the test Power and Sample Size Power of the test

• The Power is the probability of rejecting the Null Hypothesis when it is false

– i.e. the probability of making a correct decision

• The ideal power of the test is 100%

• However there is always a possibility of making a Type II error Sample size

• If the number of patients/samples in the study is small, there may be inadequate power to detect an important existing effect – wasted resources

• If the sample is too large, the study may be unnecessarily time – consuming, expensive or unethical

• Need to choose an optimal sample size that strikes a balance between the implications of making a Type I or Type II error Calculating an optimal sample size for a test

• The following quantities need to be specified at the design stage of the investigation in order to calculate an optimal sample size:

– The Power – Significance Level – Variability – Smallest effect of interest Recall: 5 stages

• A randomised double blind trial to determine the effect of inhaled corticosteroids on wheezing episodes in children • An inhaled beclomethasone dipropionate was compared to a Placebo • Response variable was average forced expiratory volume (FEV) over a 6 month period • Sample sizes: Treatment group =50, Placebo group = 48 Stages 1 and 2

• Stage 1: Define Ho and H1:

Ho: the mean FEV in the population of children is the same in the two groups

H1: the mean FEV in the population of children is different in the two groups

• Stage 2: Collect data Graphical Analysis

Boxplots comparing treated group to control group 2.50

2.25

)

(

e 2.00

1.75

i 1.50

e 1.25

F 1.00

Treated Group Control Group Selecting a test

• What is the data type? Numerical • How many groups are there? 2 • Are the groups Paired or Independent? Independent • Is Normality and equal variances of the data assumed? Yes

→Unpaired (Independent) t-test Analysis Output

Stages 3 and 4: Calculate the Sample N Mean StDev SE Mean test statistic and compare to 1 50 1.640 0.286 0.040 values from a known probability 2 48 1.537 0.246 0.035 distribution

Difference = mu (1) - mu (2) Estimate for difference: 0.1033 95% CI for difference: (-0.0038, 0.2104) T-Test of difference = 0 (vs not =): T-Value = 1.91 P-Value = 0.059 DF = 96 Both use Pooled StDev = 0.2670 Stage 5: Interpret the results

• The P-value is 0.059

• There is insufficient evidence (just!) to reject Ho at the 5% level • There is insufficient statistical evidence of a difference between the 2 groups • The Power of the Test should be checked • A Type II error may be made when not rejecting Ho Scenario 2

• A study was conducted to determine if a heart condition influences the age at which children start to walk • Response variable was age the children started to walk • 30 children with a specific heart condition were analysed in the study • Children (in general) are known to start walking at an age of 11.4 months • Does the heart condition influence the age at which children start to walk? Stages 1 and 2

• Stage 1: Define Ho and H1

Ho: the mean walking age of the children with the heart condition = 11.4 months

H1: the mean walking age of the children with the heart condition ≠ 11.4 months • Stage 2: Collect data Graphical Analysis

Histogram showing walking age of children

q 3

F 2

0 10 12 14 16 18 Months Selecting a test

• What is the data type? Numerical • How many groups are there? 1 • Is Normality of the data assumed? Yes

→One-sample t-test Analysis Output

One-Sample T Stages 3 and 4: Calculate the test statistic and compare to values from a known prob distribution Test of mu = 11.4 vs not = 11.4

N Mean StDev SE Mean 95% CI T P 30 13.158 2.583 0.472 (12.193, 14.123) 3.73 0.001 Stage 5: Interpret the results

• The P-value is 0.001

• There is strong evidence to reject Ho • There is statistical evidence that the heart condition influences the age at which children start to walk • The Probability that a Type I error has been made in drawing this conclusion is 0.1% Correlation and Regression Correlation and Regression

• Correlation – measures the strength of association between two variables

• Regression – models a relationship between two or more variables Correlation

• The degree of association between two variables is called their correlation

• Positive correlation - when the points appear in a band running from lower left to upper right (when x increases, y increases)

• Negative correlation - when the points appear in a band from upper left to lower right (when x increases, y decreases)

• No correlation - when the points are randomly scattered about the graph Correlation and “Line of best fit”

Here are some examples Be Careful!

"Correlation does not imply causality"

• In other words, the scatter plot may show that a relationship exists, but it does not and cannot prove that one factor is causing the other

• The scatter plot can only provide a clue that two factors may be “cause and effect” Correlation - example

• Driving test scores – written paper

• Outcome compared by plotting scores against number of lessons (1-10)

– does score improve as the number of lessons increases? Scatter plot for learner drivers

170

160

150

140

k r 130

m 120

110

100

90 0 2 4 6 8 10 classes Linear Regression

• Investigates a straight line (linear) association between variables

• Straight line fitted to the scatter diagram is known as the regression equation

• Least squares – the sum of the squared differences between the observed and predicted values is minimised Medical example

• Does increasing hardness improve abrasion resistance for composites?

• Does increasing etch time improve bond strength to enamel?

• Both questions require a regression approach

– using just two or three materials of different hardness is not acceptable

– using just two etch times would not provide answers Data

Composite Hardness Wear rate 1 120 56 2 168 46 3 290 21 4 42 98 5 78 80 6 90 65 7 130 32 Regression equation 1

A regression equation is: wear = 94.6 - 0.288 hardness

Fitted Line Plot wear = 94.65 - 0.2882 hardness

100 S 14.5829 R-Sq 75.4% R-Sq(adj) 70.4% 80

w 40

0 50 100 150 200 250 300 hardness Regression equation 2

• Etch time 5 to 60 s • Bond strength 15 to 26 MPa Regression equation: bond strength = 17.3 + 0.110 etch time

Fitted Line Plot bond strength = 17.31 + 0.1103 etch time

27.5 S 2.51095 R-Sq 35.2% R-Sq(adj) 32.2% 25.0

g 22.5

d 20.0

17.5

15.0

0 10 20 30 40 50 60 etch time Summary

• Part 2 Summary • Sampling • Statistical Hypothesis Tests • Errors in Hypothesis Tests • Power and Sample Size • Examples • Correlation and Regression