Statistics 1
Revision Notes
June 2012
14/04/2013 Statistics 1 SDB 1
Contents
1 Statistical modelling 6 Statistical modelling ...... 6 Definition ...... 6 Advantages ...... 6 Disadvantages ...... 6
2 Representation of sample data 7 Variables ...... 7 Qualitative variables...... 7 Quantitative variables ...... 7 Continuous variables ...... 7 Discrete variables ...... 7 Frequency distributions ...... 7 Frequency tables ...... 7 Cumulative frequency ...... 7 Stem and leaf & back-to-back stem and leaf diagrams ...... 8 Comparing two distributions from a back to back stem and leaf diagram...... 8 Grouped frequency distributions ...... 8 Class boundaries and widths ...... 8 Cumulative frequency curves for grouped data ...... 9 Histograms ...... 9
3 Mode, mean (and median) 11 Mode ...... 11 Mean ...... 11 Coding ...... 12 Coding and calculating the mean ...... 12 Median ...... 13 When to use mode, median and mean ...... 13 Mode ...... 13 Median ...... 13 Mean ...... 13
2 14/04/2013 Statistics 1 SDB
4 Median (Q2), quartiles (Q1, Q3) and percentiles 14 Discrete lists and discrete frequency tables ...... 14 Interquartile range ...... 14 Discrete lists ...... 14 Discrete frequency tables ...... 14 Grouped frequency tables, continuous and discrete data ...... 15 Grouped frequency tables, continuous data ...... 15 Grouped frequency tables, discrete data ...... 16 Percentiles ...... 17 Box Plots ...... 17 Outliers ...... 17 Skewness ...... 18 Positive skew ...... 18 Negative skew ...... 19
5 Measures of spread 20 Range & interquartile range ...... 20 Range ...... 20 Interquartile range ...... 20 Variance and standard deviation ...... 20 Proof of the alternative formula for variance ...... 20 Rough checks, m ± s, m ± 2s ...... 21 Coding and variance ...... 21
6 Probability 23 Relative frequency ...... 23 Sample spaces, events and equally likely outcomes ...... 23 Probability rules and Venn diagrams ...... 23 Diagrams for two dice etc...... 24 Tree diagrams ...... 25 Independent events ...... 26 To prove that A and B are independent ...... 26 Exclusive events ...... 27 Number of arrangements ...... 27
14/04/2013 Statistics 1 SDB 3
7 Correlation 28 Scatter diagrams ...... 28 Positive, negative, no correlation & line of best fit...... 28 Product moment correlation coefficient, PMCC ...... 28 Formulae ...... 28 Coding and the PMCC ...... 29 Interpretation of the PMCC ...... 30
8 Regression 31 Explanatory and response variables ...... 31 Regression line ...... 31 Least squares regression line ...... 31 Interpretation ...... 32
9 Discrete Random Variables 33 Random Variables ...... 33 Continuous and discrete random variables ...... 33 Continuous random variables ...... 33 Discrete random variables ...... 33 Probability distributions ...... 33 Cumulative probability distribution ...... 34 Expectation or expected values ...... 34 Expected mean or expected value of X...... 34 Expected value of a function ...... 34 Expected variance ...... 34 Expectation algebra ...... 35 The discrete uniform distribution ...... 36 Conditions for a discrete uniform distribution ...... 36 Expected mean and variance ...... 36 Non-standard uniform distribution ...... 37
2 10 The Normal Distribution N(μ, σ ) 38 The standard normal distribution N(0, 12) ...... 38 The general normal distribution N(μ, σ2) ...... 39 Use of tables ...... 39
4 14/04/2013 Statistics 1 SDB
11 Context questions and answers 42 Accuracy ...... 42 Statistical models ...... 42 Histograms ...... 43 Averages ...... 43 Skewness ...... 44 Correlation ...... 45 Regression ...... 46 Discrete uniform distribution ...... 47 Normal distribution ...... 47
12 Appendix 49 –1 ≤ P.M.C.C. ≤ 1 ...... 49 Cauchy-Schwartz inequality ...... 49 P.M.C.C. between –1 and +1 ...... 49 Regression line and coding ...... 50 Proof ...... 50 Normal Distribution, Z = ...... 51
14/04/2013 Statistics 1 SDB 5
1 Statistical modelling
Statistical modelling
Example: When a die is rolled, we say that the probability of each number is . This is a statistical model, but the assumption that each face is equally likely might not be true. Suppose the die is weighted to increase the chance of a six. We might then find, after experimenting, that the probability of a six is and the probability of a one is , with the probability of other faces remaining at . In this case we have refined, or improved, the model to give a truer picture.
Example: The heights of a large group of adults are measured. The mean is 172⋅3 cm and the standard deviation is 12⋅4 cm. It is thought that the general shape of the histogram can be modelled by the curve