Basic Biostatistics Part 2
1st March, 2017 Content
• Part 1 Summary • Sampling • Statistical Hypothesis Tests • Errors in Hypothesis Tests • Power and Sample Size • Examples • Correlation and Regression Part 1 Summary
• What were the key learning points from Part 1?
− In groups, identify 3 key learning points from the first session Sampling Sampling
• An investigation of a population is said to be a survey or study of the population • A population is a group of individuals or objects that meets a set of pre-defined criteria; e.g. - All people with permanent residence in the UK - All patient records held in a database - All patients with schizophrenia - All staff members of an organisation - All patients registered to a particular specialist - All members of the population diagnosed with a particular health condition • A survey or study that collects information from every member of a population is referred to as a census Sampling
• Not always possible to collect information from every member of a population due to time and resources • A ‘good sample’ can be used to reliably estimate characteristics (e.g. the mean) of the population • Sample – any subset of a population
Sample
Population Sampling Error
• Errors in surveys can be divided into two categories • Sampling error - error due to taking a sample rather than studying the whole population - e.g. if a psychiatrist randomly selects a sample of patients and records the duration of each appointment, the average treatment time can be calculated - if the times for all patients were recorded (i.e. the entire population) then the population average would most likely differ from the sample average Non-sampling error
• Non-Sampling error is error due to: - poor selection of strata or sample (coverage errors) - poor data entry (processing errors) - inaccurate responses (measurement errors) - non-response errors • In surveys, non-sampling errors can be more of a problem than sampling errors Statistical Hypothesis Tests Hypothesis Testing
• A process called Hypothesis Testing is used to quantify a belief against a particular hypothesis about the population • There are many different types of hypothesis tests • Five stages for hypothesis testing can be defined: 5 Stages
1. Define the Null & Alternative Hypotheses 2. Collect data 3. Calculate the value of the test statistic 4. Compare the value of the test statistic to values from a known probability distribution 5. Interpret the P-value and results The Null Hypothesis
• The Null Hypothesis is tested which assumes no effect (e.g. the difference in means equals zero) in the population
• Example: Comparing the rates of hallucinations in men and woman in the population
− Null Hypothesis (H0): rates of hallucinations are the same in men and woman in the population The Alternative Hypothesis
• The Alternative Hypothesis is holds if the Null Hypothesis is not true
• Example
− Alternative Hypothesis (H1): rates of hallucinations are different in men and woman in the population The test statistic
• After data collection, the sample data is used to calculate a test statistic
• The test statistic is effectively the amount of evidence in the data against H0 • Generally, the larger the value (irrelevant of sign), the greater the evidence against H0 The P-value
• The test statistic is compared to values from the relevant probability distribution to obtain a P-value • The P-value is the probability of obtaining our results, or something more extreme, if the Null Hypothesis is true • The smaller the P-value, the greater the evidence against H0 Rejecting H0
• Conventionally, if the P-value < 0.05, there is sufficient evidence to reject H0
• There is only a small chance of the results occurring if H0 is true
– H0 is rejected, the results are statistically significant at the 5% level Not rejecting H0
• If the P-value ≥ 0.05, there is insufficient evidence to reject H0
– H0 is not rejected, the results are not statistically significant at the 5% level
• NB: This does not mean that the null hypothesis is true, simply that we do not have enough evidence to reject it! Parametric vs. Non-Parametric tests
• Tests which are based on the assumption that the data follows a known probability distribution (often the Normal) are known as parametric tests
• Sometimes data does not conform to the assumption so non-parametric tests can be used
• Non-Parametric tests make no assumption about the probability distribution Non-parametric tests
• Useful when:
− sample size is small − data is measured on a categorical scale (though they are used for numerical data as well)
• However:
− they have less power of detecting a real difference than the equivalent parametric tests − they lead to decisions rather than generating a true understanding of the data Statistical tests
• Numerical data (Parametric tests)
– One-sample t-test – Independent t-test – Paired t-test – One-way ANOVA Statistical tests
• Numerical data, (non-parametric tests)
– Sign test – Wilcoxon signed rank test – Wilcoxon rank sum test – Kruskal-Wallis test Statistical tests
• Categorical data
– z-test for a proportion – Sign test – McNemar’s test – Chi-squared test – Chi-squared trend test – Fisher’s exact test Choosing a statistical test
• Useful medical statistical books will contain a flowchart to provide guidance
• Considerations include:
– what is the data type? – how many groups of data are there? – can a probability distribution be assumed? Errors in Hypothesis Testing Making a wrong decision
• There is the possibility of making a wrong decision when conducting a Hypothesis test
• A wrong decision may be made when rejecting or not rejecting the Null Hypothesis
• The possible mistakes that can be made are a: – Type I error – Type II error Type I error
• Rejecting the Null Hypothesis when it is true
• Concluding that there is an effect (difference) when in reality there is none
• The maximum chance of making a Type I error is denoted by alpha α
• α is the significance level of the test, we reject the null hypothesis if the p-value is less than the significance level Type II error
• Not rejecting the Null Hypothesis when it is false
• Concluding that there is no effect (difference) when one really exists
• The chance of making a Type II error is denoted by beta β
• Its compliment 1- β, is the Power of the test Power and Sample Size Power of the test
• The Power is the probability of rejecting the Null Hypothesis when it is false
– i.e. the probability of making a correct decision
• The ideal power of the test is 100%
• However there is always a possibility of making a Type II error Sample size
• If the number of patients/samples in the study is small, there may be inadequate power to detect an important existing effect – wasted resources
• If the sample is too large, the study may be unnecessarily time – consuming, expensive or unethical
• Need to choose an optimal sample size that strikes a balance between the implications of making a Type I or Type II error Calculating an optimal sample size for a test
• The following quantities need to be specified at the design stage of the investigation in order to calculate an optimal sample size:
– The Power – Significance Level – Variability – Smallest effect of interest Recall: 5 stages
1. Define the Null & Alternative Hypotheses 2. Collect data 3. Calculate the value of the test statistic 4. Compare the value of the test statistic to values from a known probability distribution 5. Interpret the P-value and results Examples Scenario 1
• A randomised double blind trial to determine the effect of inhaled corticosteroids on wheezing episodes in children • An inhaled beclomethasone dipropionate was compared to a Placebo • Response variable was average forced expiratory volume (FEV) over a 6 month period • Sample sizes: Treatment group =50, Placebo group = 48 Stages 1 and 2
• Stage 1: Define Ho and H1:
Ho: the mean FEV in the population of children is the same in the two groups
H1: the mean FEV in the population of children is different in the two groups
• Stage 2: Collect data Graphical Analysis
Boxplots comparing treated group to control group 2.50
2.25
)
V
E
F
(
e 2.00
m
u
l
o
V
1.75
y
o
t
a
r
i 1.50
p
x
E
d
e 1.25
c
r
o
F 1.00
Treated Group Control Group Selecting a test
• What is the data type? Numerical • How many groups are there? 2 • Are the groups Paired or Independent? Independent • Is Normality and equal variances of the data assumed? Yes
→Unpaired (Independent) t-test Analysis Output
Stages 3 and 4: Calculate the Sample N Mean StDev SE Mean test statistic and compare to 1 50 1.640 0.286 0.040 values from a known probability 2 48 1.537 0.246 0.035 distribution
Difference = mu (1) - mu (2) Estimate for difference: 0.1033 95% CI for difference: (-0.0038, 0.2104) T-Test of difference = 0 (vs not =): T-Value = 1.91 P-Value = 0.059 DF = 96 Both use Pooled StDev = 0.2670 Stage 5: Interpret the results
• The P-value is 0.059
• There is insufficient evidence (just!) to reject Ho at the 5% level • There is insufficient statistical evidence of a difference between the 2 groups • The Power of the Test should be checked • A Type II error may be made when not rejecting Ho Scenario 2
• A study was conducted to determine if a heart condition influences the age at which children start to walk • Response variable was age the children started to walk • 30 children with a specific heart condition were analysed in the study • Children (in general) are known to start walking at an age of 11.4 months • Does the heart condition influence the age at which children start to walk? Stages 1 and 2
• Stage 1: Define Ho and H1
Ho: the mean walking age of the children with the heart condition = 11.4 months
H1: the mean walking age of the children with the heart condition ≠ 11.4 months • Stage 2: Collect data Graphical Analysis
Histogram showing walking age of children
6
5
4
y
c
n
e
u
q 3
e
r
F 2
1
0 10 12 14 16 18 Months Selecting a test
• What is the data type? Numerical • How many groups are there? 1 • Is Normality of the data assumed? Yes
→One-sample t-test Analysis Output
One-Sample T Stages 3 and 4: Calculate the test statistic and compare to values from a known prob distribution Test of mu = 11.4 vs not = 11.4
N Mean StDev SE Mean 95% CI T P 30 13.158 2.583 0.472 (12.193, 14.123) 3.73 0.001 Stage 5: Interpret the results
• The P-value is 0.001
• There is strong evidence to reject Ho • There is statistical evidence that the heart condition influences the age at which children start to walk • The Probability that a Type I error has been made in drawing this conclusion is 0.1% Correlation and Regression Correlation and Regression
• Correlation – measures the strength of association between two variables
• Regression – models a relationship between two or more variables Correlation
• The degree of association between two variables is called their correlation
• Positive correlation - when the points appear in a band running from lower left to upper right (when x increases, y increases)
• Negative correlation - when the points appear in a band from upper left to lower right (when x increases, y decreases)
• No correlation - when the points are randomly scattered about the graph Correlation and “Line of best fit”
Here are some examples Be Careful!
"Correlation does not imply causality"
• In other words, the scatter plot may show that a relationship exists, but it does not and cannot prove that one factor is causing the other
• The scatter plot can only provide a clue that two factors may be “cause and effect” Correlation - example
• Driving test scores – written paper
• Outcome compared by plotting scores against number of lessons (1-10)
– does score improve as the number of lessons increases? Scatter plot for learner drivers
170
160
150
140
3
s
k r 130
a
m 120
110
100
90 0 2 4 6 8 10 classes Linear Regression
• Investigates a straight line (linear) association between variables
• Straight line fitted to the scatter diagram is known as the regression equation
• Least squares – the sum of the squared differences between the observed and predicted values is minimised Medical example
• Does increasing hardness improve abrasion resistance for composites?
• Does increasing etch time improve bond strength to enamel?
• Both questions require a regression approach
– using just two or three materials of different hardness is not acceptable
– using just two etch times would not provide answers Data
Composite Hardness Wear rate 1 120 56 2 168 46 3 290 21 4 42 98 5 78 80 6 90 65 7 130 32 Regression equation 1
A regression equation is: wear = 94.6 - 0.288 hardness
Fitted Line Plot wear = 94.65 - 0.2882 hardness
100 S 14.5829 R-Sq 75.4% R-Sq(adj) 70.4% 80
60
r
a
e
w 40
20
0 50 100 150 200 250 300 hardness Regression equation 2
• Etch time 5 to 60 s • Bond strength 15 to 26 MPa Regression equation: bond strength = 17.3 + 0.110 etch time
Fitted Line Plot bond strength = 17.31 + 0.1103 etch time
27.5 S 2.51095 R-Sq 35.2% R-Sq(adj) 32.2% 25.0
h
t
g 22.5
n
e
r
t
s
d 20.0
n
o
b
17.5
15.0
0 10 20 30 40 50 60 etch time Summary
• Part 2 Summary • Sampling • Statistical Hypothesis Tests • Errors in Hypothesis Tests • Power and Sample Size • Examples • Correlation and Regression