Quantitative Methods in STEM Education Research
Topic 4: Inferential statistics
Judy Sheard Faculty of Information Technology Monash University, Australia [email protected]
QM STEM Ed 2018 1 Overview of topic 4
Hypothesis testing. Central Limit Theorem Level of significance Z-scores Confidence intervals Categories of statistical tests
QM STEM Ed 2018 2 Descriptive vs. inferential statistics
Descriptive statistics — used to describe sets of quantitative data. This involves descriptions of distributions of data and relationships between variables. Inferential statistics — used to make inferences about populations from analysis of subsets (samples) of the population.
QM STEM Ed 2018 3 Inferential statistics
“In inferential statistics, statistics are measures of the sample and parameters are measures of the population. Inferences are made about the parameters from the statistics”. (Wiersma, 1995, p.363) Inferences are made about a population based on a subset or random sample of that population.
Note that in educational research it is often not possible to have a random sample – instead we attempt to show that the sample is typical of the population by comparing demographics, e.g. gender, age, educational background.
QM STEM Ed 2018 4 Hypothesis testing
In inferential statistics, a hypothesis is used to determine whether an observation has an underlying cause or whether it was due to some random fluctuation or error in a sample. The researcher will test to see if the hypothesis is consistent with the sample data – if not the hypothesis is rejected. Two different ways of stating a hypothesis: Looking for a difference between groups; Looking for relationships between groups.
QM STEM Ed 2018 5 Hypothesis testing
On what basis do we accept or reject a hypothesis? Consider this example:
A set of exercises was designed to encourage reflection on program design. It was hypothesized that these exercises improved students’ skills in program design. This method was used on a class of 30 students. In a test on program design, the class scored a mean of 60% with a standard deviation of 10. The same test on another class that had not used these exercises, resulted in a mean score of 55% with a standard deviation of 12. Does the hypothesis seem reasonable? What if the class mean was 70%? What about 57%?
QM STEM Ed 2018 6 Null hypothesis
In inferential statistics we test the opposite of a research hypothesis using the null hypothesis. For example:
Research hypothesis: Skills in program design will be improved with the use of exercises to encourage reflection on program design. Null hypothesis: There will be no difference in skill levels in program design between students who have completed exercises to encourage reflection on program design and those who have not.
Research hypothesis: The performance of introductory programming students is related to prior programming experience. Null hypothesis: There is no relationship between programming performance and prior programming experience.
If your study finds there is a difference or some relationship then you can reject the null hypothesis (H0) and you can state that there is support for your research hypothesis (H1).
QM STEM Ed 2018 7 Sampling distribution
We need more than intuition here. We will connect probability with a statistic — using the concept of a sampling distribution of the statistic. A sampling distribution consists of the values of a statistic computed from all possible samples of a given size. (Wiersma, 2005, p.375). Note that the sampling distribution is not the sample distribution.
QM STEM Ed 2018 8 What does this mean?
We have a population. We can take a sample of size n from the populations and compute a statistic of this sample, e.g. the mean. We take all possible samples of size n and compute the statistic of these samples. We now have a distribution of the statistic.
QM STEM Ed 2018 9 Central limit theorem
The shape, location (central tendency) and variability (dispersion) of the sampling distribution is described by the central limit theorem. The central limit theorem (CLT) states: Given any population, the distribution of the sample mean is approximately a normal distribution, provided the sample size is large. This is the key theorem in statistics!
QM STEM Ed 2018 10 Central limit theorem
The central limit theorem specifies that the sampling distribution of the mean has a mean equal to the population mean (μ), a standard deviation equal to σ/√n, and is normally distributed. (σ is the standard deviation of the population) Some simulations to illustrate this: http://www.stat.sc.edu/~west/javahtml/CLT.html http://www.rand.org/statistics/applets/clt.html http://en.wikipedia.org/wiki/Concrete_illustration_of_the_central_limit_theorem
QM STEM Ed 2018 11 Level of significance
The level of significance is a probability used in testing hypotheses. It is a criterion used in making a decision about the hypothesis. The common level used in educational research is 0.05. Occasionally other levels are used: 0.01, 0.001 and 0.1. A level of 0.05 means that when the probability is lower than 0.05, the null hypothesis is rejected. It then follows that if the null hypothesis is true it will only be rejected 5% of the time.
We now connect the sampling distribution with the level of significance.
QM STEM Ed 2018 12 The “68.3 - 95.5 - 99.7” rule
QM STEM Ed 2018 13 Z-score
The z-score (also called standard score) indicates how far, and in what direction, that score deviates from its distribution's mean, expressed in units of the distribution's standard deviation. The formula for creating z-scores is:
Where: x is a raw score to be standardized μ is the mean of the population σ is the standard deviation of the population
QM STEM Ed 2018 14 Standard z-score
The z-score indicates if a score was above or below the distribution mean. A z-score of +1 indicates one standard deviation above the population mean. A z-score of -1 indicates one standard deviation below the population mean. For example, a mark of 53 on a test where the mean of all marks was 67 and the standard deviation of marks was 7 would give a standard score of -2.0.
QM STEM Ed 2018 15 Properties of standard scores
A z-score makes it possible to compare scores from different distributions. z-scores have the following properties: The mean of any set of z-scores is zero. The standard deviation of any set of z-scores is always equal to 1. The distribution of z-scores has the same shape as the distribution of raw score from which they were derived.
QM STEM Ed 2018 16 Confidence intervals
A confidence interval specifies a range within which we can have some degree of confidence of finding of finding another value – usually the population mean. To construct a confidence interval based on the normal distribution we need: a random sample of size n the sample mean the standard deviation of the population a level of confidence
QM STEM Ed 2018 17 Defining confidence intervals
To find the lower (L) and upper (U) limits for a confidence interval we use to following The std deviation L xz The sample n mean The sample U xz size n A z-score indicating the confidence level QM STEM Ed 2018 18 Confidence intervals
Increasing the confidence level widens the confidence interval. Increasing the sample size narrows the confidence interval. Increasing the standard deviation makes the interval wider. Common confidence levels are 90%, 95%, 99% - but we can specify any level below 100%.
QM STEM Ed 2018 19 Choosing the z-score
For 95% confidence we choose a central area of 0.95 on the standard normal 0.95 curve.
1.96 1.96
For 90% confidence we choose a central area of 0.90 on the standard normal curve. 0.90
1.645 1.645
QM STEM Ed 2018 20 The “68.3-95.5-99.7” rule
QM STEM Ed 2018 21 Example
The numbers below were randomly drawn from a normal population with σ = 10. 56.87, 73.96, 59.77, 75.89, 71.60, 81.94, 69.11, 80.07, 74.70, 63.32 The sample mean = 70.72 and we want a 95% confidence interval. So, 10 L 70.72 1.96 64.52 10 10 U 70.72 1.96 76.92 10
QM STEM Ed 2018 22 Example cont..
So we are 95% confident that the population mean is between 64.52 and 76.92.
What does this really mean? Would you get the same result from another random sample of size 10? What if you took another 100 samples and constructed 100 confidence intervals? They would all be different and about 5% of them would not even contain the population mean
QM STEM Ed 2018 23 The standard error
The standard error of the sample mean is: x n You can see that the standard error gets smaller as the sample size increases. The standard error also shows up in the confidence interval formula: xz This is why the n interval get smaller as n increases
QM STEM Ed 2018 24 Null hypothesis
The null hypothesis H0 is “State of the world” – the hypothesis of no actual situation H True H False difference or no 0 0 relationship. Correct Error
But there is a possibility Accept H0 (Type II of a wrong decision. error Researcher’s p = β) decision Reducing the risk of one Error Correct error increases the risk (Type I p = 1- β Reject H0 of another error. error (power) p = α)
QM STEM Ed 2018 25 Type I and Type II errors
Type I error occurs when the decision is to reject the null hypothesis when it is actually true. This probability equals the significance level. Symbolized by α
Type II error occurs when the decision is not to reject the null hypothesis when it is actually false. Symbolized by β
QM STEM Ed 2018 26 Power of a statistical test
The power of a statistical test is the probability that the test will lead to a decision to reject the null hypothesis when the null hypothesis is indeed false.
How to increase the power of a test: Increase the significance level Increase the sample size Reduce variability – e.g. use homogenous groups, exclude outliers http://www.cas.buffalo.edu/classes/psy/segal/2072001/Hyptest/Hyptsting.htm
QM STEM Ed 2018 27 Power
Predicted distribution
Actual distribution
QM STEM Ed 2018 28 Review of inferential statistics reasoning
We have a population that we wish to make measures of – parameters. We select a random sample and compute measures of the sample – statistics. The statistics reflect the corresponding parameters and sampling distribution. We observe the statistics, and infer back to the parameters in the light of the sampling distribution and probability.
QM STEM Ed 2018 29 Analysis using inferential statistics
Data may be analysed using inferential statistics. A common process is through hypothesis testing. The role of hypothesis testing is to determine whether the result obtained from analysis occurred by chance. The null hypothesis is tested through these statistical tests. There are numerous tests. There are many different sampling distributions.
QM STEM Ed 2018 30 Tests using inferential statistics
Broadly two categories:
Parametric analyses – interval scale measurement and assumptions about the population.
Nonparametric analyses – typically nominal and ordinal scale measurement and generally no assumptions about the population.
QM STEM Ed 2018 31 Deciding upon which test to use
Factors to consider: Number of independent and dependent variables. Measurement levels of independent and dependent variables. Related vs. non-related variables (only relevant when comparing groups). Number of categories for the independent variables.
QM STEM Ed 2018 32 Why is measurement important?
Compute the average mark of these results.
83 47 34 23 85 33 84 83 72 94 30
Compute the average hair colour.
black brown red blonde blonde blonde black red
QM STEM Ed 2018 33 Statistical vs. practical significance
We may be able to demonstrate statistical significance but the effects may be very small. Hence, the effects may be statistically significant but not practically significant. Perhaps the cost of the new teaching method or technology may outweigh the benefits gained.
QM STEM Ed 2018 34 Meta analysis
Quantitative methods are typically applied to individual studies. Quantitative methods may also be used to review results across studies – this is called meta analysis.
Vote counting – this technique has poor power
p-values – uses the size of the p-value
lost studies – estimates of numbers of studies not published due to non-significant results.
Effect sizes – estimates of the effect size of studies are compared
QM STEM Ed 2018 35 Categories of statistical tests
Differences between groups Degree of relationship between variables Clustering of variables or individuals Analyses across time
In the remaining lectures we will review some of the most commonly used in educational research.
QM STEM Ed 2018 36 QM STEM Ed 2018 37