Quantitative Methods in STEM Education Research

Topic 4: Inferential

Judy Sheard Faculty of Information Technology Monash University, Australia [email protected]

QM STEM Ed 2018 1 Overview of topic 4

 Hypothesis testing.   Level of significance  Z-scores  Confidence intervals  Categories of statistical tests

QM STEM Ed 2018 2 Descriptive vs. inferential statistics

— used to describe sets of quantitative data. This involves descriptions of distributions of data and relationships between variables.  Inferential statistics — used to make inferences about populations from analysis of subsets (samples) of the population.

QM STEM Ed 2018 3 Inferential statistics

“In inferential statistics, statistics are measures of the and parameters are measures of the population. Inferences are made about the parameters from the statistics”. (Wiersma, 1995, p.363) Inferences are made about a population based on a subset or random sample of that population.

 Note that in educational research it is often not possible to have a random sample – instead we attempt to show that the sample is typical of the population by comparing demographics, e.g. gender, age, educational background.

QM STEM Ed 2018 4 Hypothesis testing

In inferential statistics, a hypothesis is used to determine whether an observation has an underlying cause or whether it was due to some random fluctuation or error in a sample. The researcher will test to see if the hypothesis is consistent with the sample data – if not the hypothesis is rejected. Two different ways of stating a hypothesis:  Looking for a difference between groups;  Looking for relationships between groups.

QM STEM Ed 2018 5 Hypothesis testing

On what basis do we accept or reject a hypothesis? Consider this example:

A set of exercises was designed to encourage reflection on program design. It was hypothesized that these exercises improved students’ skills in program design. This method was used on a class of 30 students. In a test on program design, the class scored a of 60% with a standard of 10. The same test on another class that had not used these exercises, resulted in a mean of 55% with a of 12. Does the hypothesis seem reasonable? What if the class mean was 70%? What about 57%?

QM STEM Ed 2018 6 Null hypothesis

In inferential statistics we test the opposite of a research hypothesis using the null hypothesis. For example:

Research hypothesis: Skills in program design will be improved with the use of exercises to encourage reflection on program design. Null hypothesis: There will be no difference in skill levels in program design between students who have completed exercises to encourage reflection on program design and those who have not.

Research hypothesis: The performance of introductory programming students is related to prior programming experience. Null hypothesis: There is no relationship between programming performance and prior programming experience.

If your study finds there is a difference or some relationship then you can reject the null hypothesis (H0) and you can state that there is support for your research hypothesis (H1).

QM STEM Ed 2018 7 distribution

We need more than intuition here. We will connect probability with a — using the concept of a of the statistic. A sampling distribution consists of the values of a statistic computed from all possible samples of a given size. (Wiersma, 2005, p.375). Note that the sampling distribution is not the sample distribution.

QM STEM Ed 2018 8 What does this mean?

 We have a population.  We can take a sample of size n from the populations and compute a statistic of this sample, e.g. the mean.  We take all possible samples of size n and compute the statistic of these samples.  We now have a distribution of the statistic.

QM STEM Ed 2018 9 Central limit theorem

The shape, location () and variability (dispersion) of the sampling distribution is described by the central limit theorem. The central limit theorem (CLT) states: Given any population, the distribution of the sample mean is approximately a , provided the sample size is large. This is the key theorem in statistics!

QM STEM Ed 2018 10 Central limit theorem

The central limit theorem specifies that the sampling distribution of the mean has a mean equal to the population mean (μ), a standard deviation equal to σ/√n, and is normally distributed. (σ is the standard deviation of the population) Some simulations to illustrate this: http://www.stat.sc.edu/~west/javahtml/CLT.html http://www.rand.org/statistics/applets/clt.html http://en.wikipedia.org/wiki/Concrete_illustration_of_the_central_limit_theorem

QM STEM Ed 2018 11 Level of significance

The level of significance is a probability used in testing hypotheses. It is a criterion used in making a decision about the hypothesis. The common level used in educational research is 0.05. Occasionally other levels are used: 0.01, 0.001 and 0.1. A level of 0.05 that when the probability is lower than 0.05, the null hypothesis is rejected. It then follows that if the null hypothesis is true it will only be rejected 5% of the time.

We now connect the sampling distribution with the level of significance.

QM STEM Ed 2018 12 The “68.3 - 95.5 - 99.7” rule

QM STEM Ed 2018 13 Z-score

The z-score (also called standard score) indicates how far, and in what direction, that score deviates from its distribution's mean, expressed in units of the distribution's standard deviation. The formula for creating z-scores is:

Where: x is a raw score to be standardized μ is the mean of the population σ is the standard deviation of the population

QM STEM Ed 2018 14 Standard z-score

 The z-score indicates if a score was above or below the distribution mean.  A z-score of +1 indicates one standard deviation above the population mean.  A z-score of -1 indicates one standard deviation below the population mean. For example, a mark of 53 on a test where the mean of all marks was 67 and the standard deviation of marks was 7 would give a standard score of -2.0.

QM STEM Ed 2018 15 Properties of standard scores

A z-score makes it possible to compare scores from different distributions. z-scores have the following properties:  The mean of any set of z-scores is zero.  The standard deviation of any set of z-scores is always equal to 1.  The distribution of z-scores has the same shape as the distribution of raw score from which they were derived.

QM STEM Ed 2018 16 Confidence intervals

A specifies a within which we can have some degree of confidence of finding of finding another value – usually the population mean. To construct a confidence interval based on the normal distribution we need:  a random sample of size n  the sample mean  the standard deviation of the population  a level of confidence

QM STEM Ed 2018 17 Defining confidence intervals

To find the lower (L) and upper (U) limits for a confidence interval we use to following The std  deviation L xz The sample n mean  The sample U xz size n A z-score indicating the confidence level QM STEM Ed 2018 18 Confidence intervals

 Increasing the confidence level widens the confidence interval.  Increasing the sample size narrows the confidence interval.  Increasing the standard deviation makes the interval wider.  Common confidence levels are 90%, 95%, 99% - but we can specify any level below 100%.

QM STEM Ed 2018 19 Choosing the z-score

 For 95% confidence we choose a central area of 0.95 on the standard normal 0.95 curve.

1.96 1.96

 For 90% confidence we choose a central area of 0.90 on the standard normal curve. 0.90

1.645 1.645

QM STEM Ed 2018 20 The “68.3-95.5-99.7” rule

QM STEM Ed 2018 21 Example

The numbers below were randomly drawn from a normal population with σ = 10. 56.87, 73.96, 59.77, 75.89, 71.60, 81.94, 69.11, 80.07, 74.70, 63.32 The sample mean = 70.72 and we want a 95% confidence interval. So, 10 L 70.72  1.96  64.52 10 10 U 70.72  1.96  76.92 10

QM STEM Ed 2018 22 Example cont..

So we are 95% confident that the population mean is between 64.52 and 76.92.

What does this really mean?  Would you get the same result from another random sample of size 10?  What if you took another 100 samples and constructed 100 confidence intervals?  They would all be different and about 5% of them would not even contain the population mean

QM STEM Ed 2018 23 The

The standard error of the sample mean is:    x n You can see that the standard error gets smaller as the sample size increases. The standard error also shows up in the confidence interval formula:  xz This is why the n interval get smaller as n increases

QM STEM Ed 2018 24 Null hypothesis

The null hypothesis H0 is “State of the world” – the hypothesis of no actual situation H True H False difference or no 0 0 relationship. Correct Error

But there is a possibility Accept H0 (Type II of a wrong decision. error Researcher’s p = β) decision Reducing the risk of one Error Correct error increases the risk (Type I p = 1- β Reject H0 of another error. error (power) p = α)

QM STEM Ed 2018 25 Type I and Type II errors

 Type I error occurs when the decision is to reject the null hypothesis when it is actually true. This probability equals the significance level. Symbolized by α

 Type II error occurs when the decision is not to reject the null hypothesis when it is actually false. Symbolized by β

QM STEM Ed 2018 26 Power of a statistical test

The power of a statistical test is the probability that the test will lead to a decision to reject the null hypothesis when the null hypothesis is indeed false.

How to increase the :  Increase the significance level  Increase the sample size  Reduce variability – e.g. use homogenous groups, exclude outliers http://www.cas.buffalo.edu/classes/psy/segal/2072001/Hyptest/Hyptsting.htm

QM STEM Ed 2018 27 Power

Predicted distribution

Actual distribution

QM STEM Ed 2018 28 Review of inferential statistics reasoning

 We have a population that we wish to make measures of – parameters.  We select a random sample and compute measures of the sample – statistics.  The statistics reflect the corresponding parameters and sampling distribution.  We observe the statistics, and infer back to the parameters in the light of the sampling distribution and probability.

QM STEM Ed 2018 29 Analysis using inferential statistics

Data may be analysed using inferential statistics. A common process is through hypothesis testing. The role of hypothesis testing is to determine whether the result obtained from analysis occurred by chance. The null hypothesis is tested through these statistical tests. There are numerous tests. There are many different sampling distributions.

QM STEM Ed 2018 30 Tests using inferential statistics

Broadly two categories:

 Parametric analyses – interval scale measurement and assumptions about the population.

 Nonparametric analyses – typically nominal and ordinal scale measurement and generally no assumptions about the population.

QM STEM Ed 2018 31 Deciding upon which test to use

Factors to consider:  Number of independent and dependent variables.  Measurement levels of independent and dependent variables.  Related vs. non-related variables (only relevant when comparing groups).  Number of categories for the independent variables.

QM STEM Ed 2018 32 Why is measurement important?

 Compute the average mark of these results.

83 47 34 23 85 33 84 83 72 94 30

 Compute the average hair colour.

black brown red blonde blonde blonde black red

QM STEM Ed 2018 33 Statistical vs. practical significance

We may be able to demonstrate statistical significance but the effects may be very small. Hence, the effects may be statistically significant but not practically significant. Perhaps the cost of the new teaching method or technology may outweigh the benefits gained.

QM STEM Ed 2018 34 Meta analysis

Quantitative methods are typically applied to individual studies. Quantitative methods may also be used to review results across studies – this is called meta analysis.

 Vote counting – this technique has poor power

 p-values – uses the size of the p-value

 lost studies – estimates of numbers of studies not published due to non-significant results.

 Effect sizes – estimates of the of studies are compared

QM STEM Ed 2018 35 Categories of statistical tests

 Differences between groups  Degree of relationship between variables  Clustering of variables or individuals  Analyses across time

In the remaining lectures we will review some of the most commonly used in educational research.

QM STEM Ed 2018 36 QM STEM Ed 2018 37