Cashmere High School Mathematics and Statistics Faculty Topic: Statistics 91264 – Use statistical methods to make an inference (2.9) Year Level: Credits: Curriculum Strands: Level 2 4 (internal) Statistics

Achievement Objectives

In a range of meaningful contexts, student will be engaged in thinking mathematically and statistically.

They will solve problems and model situations that require them to: Level 7 Students need to be familiar with… S7-1:Carry out investigations of phenomena, using the … the process of using the statistical enquiry cycle statistical enquiry cycle: to make an inference, which involves:  using existing data sets - posing an appropriate comparison question  evaluating the choice of sampling and data from a given set of population data collection methods used - selecting random samples  using relevant contextual knowledge, exploratory - selecting and using appropriate displays and data analysis, and statistical inference. measures - discussing sample distributions S7-2: Make inferences from surveys: - discussing sampling variability, including the  using sample statistics to make point estimates variability of estimates of population parameters - making an inference  recognising the effect of sample size on the - communicating findings in a conclusion. variability of an estimate.

Key Competencies

Capabilities for living and lifelong learning

Thinking Using language, Relating to Others Managing Self Participating & symbols & text Contributing      Vocabulary Equipment Key Resources  estimate  Kiwi data cards  Pip Arnold resources (kiwi  point estimate  Fathom software kapers)  parameter   Lindsay Smith resources  sample  population  sampling variability  statistical inference

91264 – Use statistical methods to make an inference UNIT PLAN 1 Key ideas: Statistical literacy • Using correct vocabulary: estimate, point estimate, parameter, sample • Developing critical thinking with respect to the media involving sampling to make an inference • Applying the PPDAC cycle Sampling variability • Every sample contains sampling error due to the sampling process • Other errors, non-sampling errors, may be present due to the sampling method applied (bias) • Developing an understanding that confidence in the estimate will vary depending on factors such as sample size, sampling method, the nature of the underlying population, sources of bias. • Experiencing evidence for the central limit theorem by simulating samples and comparing the distribution of sample medians for samples of different sizes. Informal confidence intervals • Using the Level 7 guideline for constructing informal confidence intervals for the population medians • Informal development of the formula

Sample statistics

91264 – Use statistical methods to make an inference UNIT PLAN 2 Content:

Recap from Level 1 Teaching & Learning Assessment Problem Posing good investigative Pose comparison question questions

Plan Sampling methods Describe & justify Sample size - sampling method  - sample size sample size vs sampling variability

Data Data types

Analysis Data displays Informal confidence Data displays Describing distributions intervals Summary statistics - centre Sampling variability including informal CI - spread including variability of Description - shape estimates - overlap  - shift Level 7 guide: - middle 50% median  - unusual/ interesting 1.5IQRsqrt(n) Conclusion Making the call Level 5 Making the call Level 7 Supported correct (inference) inference Conclusions in context

91264 – Use statistical methods to make an inference UNIT PLAN 3 Suggested teaching order

Adapted from suggestions from Pip Arnold & Lindsay Smith 2011

1. Introduction to making an inference [LS]

2. Sampling methods [LS]

3. Using a sample to make a point estimate & sampling variability [LS] 4. Using a sample to make a point estimate & sampling variability {Kiwi kapers 1} [PA]

5. Sampling variability: effect of sample size [LS] 6. Sampling variability: effect of sample size {Kiwi kapers 2} [PA]

7. Sampling variability: effect of spread of population [LS] 8. Developing the formula for informal confidence interval for the population median {Kiwi kapers 3} [PA] 9. PPDAC for summary & checking how well our intervals capture the population median 10. PPDAC for summary 11. PPDAC for comparison (clear difference) – Auckland stats data 12. PPDAC for comparison (clear difference) – Kiwi data 13. PPDAC for comparison (not a clear difference)- Facebook and cellphones [LS]

Resources available

From Senior Secondary Curriculum Guide (http://seniorsecondary.tki.org.nz/Mathematics- and-statistics/Achievement-objectives/AO-S7-1)

Please see separate document 91264 Sequence of Learning Experiences

91264 – Use statistical methods to make an inference UNIT PLAN 4 Notes and Definitions

Exploratory data analysis notes  Exploratory data analysis starts with multivariate data. Investigative questions that can be asked of the data should be posed: such as o wondering whether there is a connection between two variables, o wondering whether other variables should be taken into account when possible patterns are observed, o exploring multiple representations of the data into order to unlock the stories in the sample data.  Technology such as a graphics calculator can draw a modified box plot, which shows whether extreme data values are outliers. Outliers are not simply the greatest or least data values. Outliers are more than 1.5 times the standard deviation above the upper quartile or below the lower quartile.  If the sample box plot is approximately symmetrical and has no outliers it can be assumed the population has a similar distribution.  If the sample data is skewed, then the median will be more reliable than the mean as an estimate of the population central value. However, if the distribution of the sample data is skewed this does not imply that the population is skewed. The skewness may be an artefact of sampling variability.  A statistical estimate is not a guess but an inference or prediction of the true population parameter based on sample statistics. The sample median is used to infer (used as a point estimate of) the population median. Similarly the sample mean, quartiles, standard deviation can be used as estimates of the corresponding population parameters. A sample proportion can be used to estimate a population proportion, for example, the fraction or percentage of students who travel more than 30 minutes to and from school each day.  Evaluation of sampling and data collection methods must be based on identifying features of good sample design or good experimental design. Appropriate considerations are those that would make the inference more reliable/less variable: o such as further (described) strata, o repeated sampling and averaging statistics, o context factors o relative size of the mean and standard deviation ie if the standard deviation is small in relation to the mean, then the population is likely to be closely spread about the population mean. o If the sample contains at least 30 items, it may be trivial at Level 7 to suggest a larger sample would improve the inference of a measurement. Measure  An amount or quantity that is determined by measurement or calculation. The term ‘measure’ is used in two different ways in the curriculum.  One use is in the terms measure of centre, measure of spread, and measure of proportion, where these measures are calculated quantities that represent characteristics of a distribution. The use of ‘using displays and measures’ in the level 6 (statistical investigation thread) achievement objective is a reference to measures of centre, spread, and proportion.  The other use applies to a statistical investigation. The investigator decides on a subject of interest and then decides the aspects of it that can be observed. These aspects are the ‘measures’. Example o An investigator decides that ‘well-being’ is a subject of interest and chooses ‘happiness’ to be one aspect of well-being. Happiness could be measured by the variable ‘the average number of times a person laughs in a day’.

91264 – Use statistical methods to make an inference UNIT PLAN 5 Non-sampling error One of the two reasons for the difference between an estimate (from a sample) and the true value of a population parameter; the other reason being the error caused because data are collected from a sample rather than the whole population (sampling error). Non-sampling errors have the potential to cause bias in surveys or samples. There are many types of non-sampling errors, and the names used for them are not consistent. Some examples of non-sampling errors are:  The sampling process is such that a specific group is excluded or under-represented in the sample, deliberately or inadvertently. If the excluded or under-represented group is different, with respect to survey issues, then bias will occur.  The sampling process allows individuals to select themselves. Individuals with strong opinions or those with substantial knowledge will tend to be over-represented, creating bias.  Bias will occur if people who refuse to answer have different views of the survey issues from those who respond. This can also happen with people who are never contacted and people who have yet to make up their minds.  If the response rate (the proportion of the sample that takes part in a survey) is low, bias can occur because respondents may tend consistently to have views that are more extreme than those of the population in general.  The wording of questions, the order in which they are asked, and the number and type of options offered can influence survey results.  Answers given by respondents do not always reflect their true beliefs because they may feel under social pressure not to give an unpopular or socially undesirable answer.  Answers given by respondents may be influenced by the desire to impress an interviewer. Sampling error  The error caused because data are collected from part of a population rather than the whole population.  An estimate of a population parameter, such as a sample median or sample proportion, is different for different samples (of the same size) taken from the population. Sampling error is one of two reasons for the difference between an estimate and the true, but unknown, value of the population parameter. The other reason is non-sampling error.  The error for a given sample is unknown, but when sampling is random, the size of the sampling error can be estimated by calculating the margin of error. Sampling variation  The variation in a sample statistic from sample to sample.  Suppose a sample is taken and a sample statistic, such as a sample median, is found. If a second sample of the same size is taken from the same population, it is almost certain that the sample median found from this sample will be different from that found from the first sample. If further sample medians are found, by repeatedly taking samples of the same size from the same population, then the differences in these sample medians illustrate sampling variation. Sample size  The number of objects, individuals, or values in a sample.  Typically, a larger sample size leads to an increase in the precision of a statistic as an estimate of a population parameter.  The most common symbol for sample size is n.

91264 – Use statistical methods to make an inference UNIT PLAN 6 Sampling Notes  Reasons for sampling include time and cost considerations, lack of access to the entire population and the nature of the data collection or test, for example, blood test does not require all blood to be taken, testing breaking strain of fishing line destroys the line.  Features of a good sampling technique include the sample is sufficiently large, randomly chosen and representative of the population.  Sample size affects the variability of an inference. If a sample is too small, it is more likely to be unusual and less likely to be representative. As the Central Limit Theorem for sample means (a level 8 objective) applies to samples of at least 30 items, random samples of this size are acceptable. There is no statistical requirement that a sample be a proportion of the population. For an inference of a population proportion, however, a much larger sample size is needed, at least 250. This size comes from margin of error considerations (a level 8 objective) but at level 7 an intuitive understanding is sufficient.  Randomised sampling techniques include simple random, systematic, stratified, cluster, and quota.  It is important to identify the positive features of each method and be able to carry out each method correctly in order for the sample to be as representative as possible. Students must be able to provide evidence they have carried out their chosen sampling methods correctly. If a sample is randomly chosen then it is representative of the population. Sources of variation  The reasons for differences seen in the values of a variable. Some of these reasons are summarised in the following paragraphs.  Variation is present everywhere and is in everything. When the same variable is measured for different individuals, there will be differences in the measurements, simply due to the fact that individuals are different. This can be thought of as individual-to-individual variation and is often described as natural or real variation.  Repeated measurements on the same individual may vary because of changes in the variable being measured. For example, an individual’s blood pressure is not exactly the same throughout the day. This can be thought of as occasion-to-occasion variation.  Repeated measurements on the same individual may vary because of some unreliability in the measurement device, such as a slightly different placement of a ruler when measuring. This is often described as measurement variation.  The difference in measurements of the same quantity for different individuals, apart from natural variation, could be due to the effect of one or more other factors. For example, the difference in growth of two tomato plants from the same packet of seeds planted in two different places could be due to differences in the growing conditions at those places, such as soil fertility or exposure to sun or wind. Even if the two seeds were planted in the same garden, there could be differences in the growth of the plants due to differences in soil conditions within the garden. This is often described as induced variation.  Variation occurs in all sampling situations. Suppose a sample is taken and a sample statistic, such as a sample median, is found. If a second sample of the same size is taken from the same population, it is almost certain that the sample median found from this sample will be different from that found from the first sample. If further sample medians are found, by repeatedly taking samples of the same size from the same population, then the differences in these sample medians illustrate sampling variation. Statistical inference  The process of drawing conclusions about population parameters based on a sample taken from the population. Example 1 o Using a sample median calculated from a random sample taken from a population to estimate the population median is an example of statistical inference. Example 2 o Using data from a random sample taken from a population to obtain a 95% confidence interval for the population proportion is an example of statistical inference.

91264 – Use statistical methods to make an inference UNIT PLAN 7 Unit Review: 

91264 – Use statistical methods to make an inference UNIT PLAN 8