Univariate Statistics Summary

Univariate Statistics Summary Further Maths Univariate Statistics Summary Types of Data Data can be classified as categorical or numerical. Categorical data are observations or records that are arranged according to category. For example: the favourite colour of a class of students; the mode of transport that each student uses to get to school; the rating of a TV program, either “a great program”, “average program” or “poor program”. Postal codes such as “3011”, “3015” etc. Numerical data are observations based on counting or measurement. Calculations can be performed on numerical data. There are two main types of numerical data Discrete data, which takes only fixed values, usually whole numbers. Discrete data often arises as the result of counting items. For example: the number of siblings each student has, the number of pets a set of randomly chosen people have or the number of passengers in cars that pass an intersection. Continuous data can take any value in a given range. It is usually a measurement. For example: the weights of students in a class. The weight of each student could be measured to the nearest tenth of a kg. Weights of 84.8kg and 67.5kg would be recorded. Other examples of continuous data include the time taken to complete a task or the heights of a group of people. Exercise 1 Decide whether the following data is categorical or numerical. If numerical decide if the data is discrete or continuous. 1. 2. Page 1 of 21 Univariate Statistics Summary 3. 4. Solutions 1a Numerical-discrete b. Categorical c. Categorical d. Numerical-Continuous 2a. Numerical b. Categorical c. Numerical d. Categorical e. Categorical 3a Continuous b. Discrete c. Continuous d. Continuous e. Discrete 4. D Representing Data Data is often represented in frequency charts, column charts, histograms and dot plots. Remember: the frequency of an observation is the number of times that observation occurs. Page 2 of 21 Univariate Statistics Summary Example 1: The following frequency distribution table gives the number of days of each weather type for the month of January. Represent the information using a column chart. Weather Type 20 15 10 Frequency 5 0 Hot Warm Mild Cool Weather Example 2 Represent the data in a. a histogram. b. a frequency polygon a. b. Notice that for a histogram there is no gap between the bars and the number of visits are positioned at the centre of each bar. Page 3 of 21 Univariate Statistics Summary Example 3 Represent the data in a. a histogram b. a frequency polygon a. b. Notice that the numbers are placed at the edges of the bars along the x-axis for grouped data. Types of Average There are three types of average which can represent a set of data. An average is a measure of central tendency. The Mean The most common average is the mean. 푥̅ is used to denote the mean. 푠푢푚 표푓 푎푙푙 푠푐표푟푒푠 푥̅ = 푛푢푚푏푒푟 표푓 푠푐표푟푒푠 Example 1: The following data gives the number of pets kept in each of 10 different households. 3, 5, 4, 4, 2, 3, 0, 1, 4, 5 The mean number of pets is given by: 3 + 5 + 4 + 4 + 2 + 3 + 0 + 1 + 4 + 5 = 3.1 10 The mean is sometimes not the best average to use as it is affected by extreme scores or outliers. Page 4 of 21 Univariate Statistics Summary The Median The median of a set of scores is the middle score when the data are arranged in order of size. 푛+1 The median’s position is given by th score, where n is the number of scores. 2 10+1 In example 1 the median’s position is given by the th score. This is the 5.5th score or halfway 2 between the 5th and 6th score, after the scores have been arranged in order of size. Arranging the data in order of size: 0, 1, 2, 3, 3, 4, 4, 4, 5, 5 3+4 Median number of pets is: = 3.5 (as there are two middle scores we take their mean. 2 The median is not affected by extreme values or outliers. The Mode The mode of a group of scores is the score that occurs most often. That is the score with the highest frequency. In example 1 the modal number of pets is 4. More than one mode is possible. Frequency Tables Example 4 The table indicates that 6 students made 0 cinema visits, 7 students made 1 cinema visit, 4 students made 2 cinema visits etc. The mean number of visits can be found by adding an extra column to the table and multiplying the number of visits by the frequency. Number of visits (풙) Frequency (풇) 풇 × 풙 0 6 0 1 7 7 2 4 8 3 2 6 4 1 4 Total 20 25 푡표푡푎푙 표푓 푓×푥 25 a. mean number of visits = = = 1.25 푡표푡푎푙 표푓 푓 20 Page 5 of 21 Univariate Statistics Summary b. the median number of visits can be found by finding the position of the median as the number of visits are in order of size in the table. 푛+1 20+1 The median’s position is the th score = th = 10.5th position. Halfway between the 10th 2 2 and 11th scores. The median’s position falls within the second row and is therefore 1. c. The mode is the score with the highest frequency. The mode is 1. Alternatively, the mean and median can be found using Lists and Spreadsheet in the calculator. 1: Enter the data into Lists and Spreadsheet 2: Hit Menu, Statistics, Stat Calculations, One view Variable Statistics… 3. Click OK when number of lists appears. 6. The statistical data appears. The mean is given by 푥̅ = 1.25 4. In the pop up, click in the X1 List box and The median is 1. select visits from the drop down list. Hit the Tab key to move to the next box and select freq from n is useful as it gives the frequency total. the drop down list in the Frequency List box There is no need to enter data into the other boxes. 5. Click OK. Page 6 of 21 Univariate Statistics Summary Grouped Data When data is presented in a frequency table within class intervals, and we do not know the actual values within each class interval, we assume that all values are equal to the midpoint of the class interval in order to find the mean. Example 5: The ages of a group of 30 people attending a superannuation seminar are recorded in the frequency table below, calculate the mean age. Age (Class Intervals) Frequency 20 - 29 1 30 - 39 6 40 - 49 13 50 - 59 6 60 - 69 3 70 - 79 1 Total 30 To find the mean age, assume all people in the class interval 20 - 29 are 24.5 years of age (This value is obtained by finding the midpoint of 20 - 29), all people in the class interval 30 - 39 are 34.5 years of age and so on. The mean age can be found from the table below: Age (Class Intervals) Frequency 풇 Midpoint of 풇 × 풎 Class Interval 풎 20 - 29 1 24.5 24.5 30 - 39 6 34.5 207 40 - 49 13 44.5 578.5 50 - 59 6 54.5 327 60 - 69 3 64.5 193.5 70 - 79 1 74.5 74.5 Total: 30 1405 total of f×m 1405 The mean age 푥̅ = = = 46.8 푦푒푎푟푠 (푐표푟푟푒푐푡 푡표 1 푑푒푐푚푎푙 푝푙푎푐푒) total of f 30 The above can be more easily done using Lists and Spreadsheet on the calculator. Page 7 of 21 Univariate Statistics Summary 1: In Lists and Spreadsheets view enter the data for the midpoints and the frequency into the first two columns. Label the columns as shown. 2. Press Menu, Statistics, Stat Calculations, One Variable Statistics. 3. Leave the Number of Lists as 1 and select OK. 4. In the pop up box, click in the X1 List box and select midpoint from the drop down list. Press the Tab key to move to the Frequency List box and select freq from the drop down list. 5. Press the TAB key to move to the OK button. The mean 푥̅ = 46.833 The mean age = 46.8 years Page 8 of 21 Univariate Statistics Summary Measures of Variability or Spread It is useful to be able to measure the spread or variability of the data. How dispersed is the data? The Range The simplest measure of spread is the range. The range is the difference between the smallest score and the largest. Example 1 The set of data 3, 5, 4, 4, 2, 3, 0, 1, 4, 5 (which gave the number of pets in each of 10 households) has a range of 5 − 0 = 5 The Interquartile Range (IQR) 1 The lower quartile, 푄 is of the way through the set of data. 1 4 3 The upper quartile, 푄 is of the way through the set of data. 3 4 The 퐼푄푅 = 푄3 − 푄1 Page 9 of 21 Univariate Statistics Summary Example 6 The Standard Deviation The standard deviation gives a measure of the spread of the data about the mean. The formula to find the standard deviation is complex and we usually find it directly from the calculator. On the calculator it is denoted by the symbol 푠푥 . The bigger the standard deviation, the greater the spread of data. Example 7 Find the standard deviation of the set of data: 12, 9, 4, 6, 5, 8, 9, 4, 10, 2.

Univariate Statistics Summary

Efficient Estimation of Parameters of the Negative Binomial Distribution

Univariate and Multivariate Skewness and Kurtosis 1

Characterization of the Bivariate Negative Binomial Distribution James E

Understanding PROC UNIVARIATE Statistics Wendell F

Univariate Probability Distributions

Univariate Analysis and Normality Test Using SAS, STATA, and SPSS

Student Activity 2: Basic Statistical Analysis with SAS

Stat 5101 Notes: Brand Name Distributions

Statistics for Data Science

Univariate Outliers Q3+1.5×IQR)

Univariate Continuous Distribution Theory

Chapter 2 Univariate Probability