<<

Intermediate and Standard Scores Session 3

Oscar BARRERA [email protected]

February 6, 2018 Previous lecture Standard Scores

Grades’ distribution

1 The best (N-1) scores on problem sets 30%, the midterm 30%, the final exam 30% and 10% participation. 2 Three controls 20%(each), Participation: 10 % , Final exam 30 %. 3 Two controls 20% (each), 20% problem sets, 10% participation, 30% final exam.

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Outline

1 Previous Lecture Variability

2 Standard Scores Definition Characteristics Normal Distribution

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Central tendency Measures of Central Tendency

In a normal distribution the most frequent scores cluster near the center and less frequent scores fall into the tails.

Central tendency most scores(68%) in a normally distributed set of data tend to cluster in the central tendency area. [Come back to characteristics!]Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Central tendency Central tendency and dispersion

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Central tendency Measures of Central Tendency

There are at least three characteristics you look for in a descriptive to represent a set of data. 1. Represented: A good descriptive statistic should be similar to many scores in a distribution. (High frequency) 2. Well balanced: neither greater-than or less-than scores are overrepresented 3. Inclusive: Should take individual values from the distribution into account so no value is left out

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Central tendency Central tendency and Dispersion

Central tendency measures The (Mo): is the most frequently occurring in a distribution. The (Md) is the middle score of a distribution. Quartiles Dispersion measures Sum of squares Standard

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Central tendency Central tendency measures The mode

The Mode(Mo) is the most frequently occurring score in a distribution. Limitations Multi-modal: There can be more than one mode Lack of Representativeness :It may not be a good representative of all values

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Central tendency Central tendency measures The median

The median (Md) is the middle score of a distribution. Half on the left half on the right (the 50th .) Better measure of central tendency than the mode since it balances perfectly distribution. How to find it?: Two simple steps 1. determine the median’s location 2. find the value at that location. It differs whether you have an even or an odd number of scores. Limitation: The median does not take into account the actual values of the scores in a set of data.

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Central tendency Central tendency measures Quartiles

Quartiles split a distribution into fourths or quarters of the distribution. There are actually three quartile points in a distribution: The 1st quartile (Q1) separates the lower 25% of the scores from the upper 75% of the scores; the 2nd quartile (Q2) is the median and separates the lower 50% of the scores from the upper 50%; and the 3rd quartile (Q3) separates the upper 25% of the scores from the lower 75%. Steps: Determining the quartile value is like determining the median.

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Central tendency Central tendency measures Understanding Sigma

The symbol Σ in Maths means Summation. Means to add all values to the right of Σ (say var X ).

ΣX Means to sum up all values that belong to variable X. IMPORTANT!! Σ is a grouping symbol like a set of parentheses and everything to the right of Σ must be completed before summing the resulting values.

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Central tendency Central tendency measures The mean

Is the arithmetic average of all the scores in a distribution. The mean is the most-often used measure of central tendency 1 It evenly balances a distribution so both the large and small values are equally represented 2 Takes into account all individual values. To estimate it.. in two steps 1. add together all the scores in a distribution ΣX . 2. divide that sum by the total number of scores in the distribution. ΣX X = M = n Population ΣX µ = N Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Central tendency Central tendency measures Mean -Limitations

One problem...it takes individual values into account, By taking individual values into account the mean can be influenced by extremely large or extremely small values (outliers). Specifically, extremely small values pull the mean down and extremely large values pull the mean up. This only occurs in skewed distributions: it is good to use the median as a measure of central tendency (Why?)

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Central tendency What to use, when?

Nominal data ⇒ the mode is the best measure of central tendency. (i.e Sex) ordinal scale ⇒ the median may be more appropriate. e the individual values on an ordinal scale are meaningless interval or ratio scale, ⇒ the mean is generally preferred. (Counts for all obs)

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Central tendency The Mean Median and Mode in Normal and Skewed Distributions

The positions of the mean, median, and mode are affected by whether a distribution is normally distributed or skewed. In data normally distributed: mean =median = mode. 50% of the scores must lie above the center point, which is also the mode, and the other 50% of the scores must lie below. Because the distribution is perfectly symmetrical the differences between the mean and the values larger than the mean must cancel out. Negatively skewed the tail of the distribution is to the left and the hump is to the right. Positively skewed the tail of the distribution is to the right and the hump is to the left, Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Central tendency The Mean Median and Mode in Normal and Skewed Distributions

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Central tendency The Mean Median and Mode in Normal and Skewed Distributions

So if the mean differs from the median/mode, the distribution is skewed. The median is better as the measure of central tendency when the distribution is positively or negatively skewed. Best example. Income

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Central tendency Box Plots

Is a way to present the dispersion of scores in a distribution by using five pieces of statistical information: the minimum value, the first quartile, the median, the third quartile, and the maximum value.

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Variability Variability

Variability is simply the differences among items, which could be differences in eye color, hair color, height, weight, sex, intelligence, etc. There are several measures of variability. I will go from the easiest to more complex ones The range is the largest value minus the smallest value. Sum of squares Is the sum of the squared deviation scores from the mean and measures the summed or total variation in a set of data. The variance: Avg sum of squares The .: square root of the variance

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Variability Variability The range

The range is the largest value minus the smallest value. It provides information about the area a distribution covers, BUT does not say anything about individual values.

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Variability Variability Sum of squares

Is the sum of the squared deviation scores from the mean and measures the summed or total variation in a set of data.

SS = Σ(X − X )2 could be equal to zero if there is no variability and all the scores are equal Can’t be negative (mathematically impossible) Limitations Sum of squares measures the total variation among scores in a distribution; sum of squares does not measure average variability We want a measure of variability that takes into account both the variation of the scores and number of scores in a distribution Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Variability Variability Variance

Sample Variance S2 is the average sum of the squared deviation scores from a mean. Measures the average variability among scores in a distribution.

Σ(X − X )2 SS S2 = = n n The one issue with sample variance is it is in squared units, not the original unit of measurement.

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Variability Variability Standard Deviation

Once sample variance has been calculated, calculating the sample standard deviation (s) is as simple as taking the square root of the variance.

s r Σ(X − X )2 SS √ S = = = S2 n n The standard deviation measures the average deviation between a score and the mean of a distribution.

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Definition Comparing Scores across Distributions

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Definition Comparing Scores across Distributions

To understand why comparing across distributions is problematic let’s think about an example.. You have a friend, say Dennis, who thinks he is smarter than everybody. Both you and Dennis take Statistics in the same semester but with different professors. You have a test in that course on the same day, and the content of the test is exactly the same. On the test, Dennis obtains 18 out of 20 points and you obtain 15 out of 20 points.

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Definition Of course, Dennis starts bragging and boasting about his statistics intelligence.

But who really did better on this test? Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Definition

Say each test was on the same material and included the same content, but... The make-up of each test was slightly different e.g., more multiple choice questions for one section, slightly differently-worded questions.

So, let’s assume that Your class’ mean test grade was M = 13 points and Dennis’ class mean was M = 16 points. Now, who did better?

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Definition

Your class’ mean test grade was M = 13 points and Dennis’ class mean was M = 16 points. Now, who did better? It seems you and Dennis performed equally, because your scores were each 2 points above your respective class mean However, because the means of each class are different, this suggests the underlying distributions in each class are different 2 point difference in one class may be different than a 2 point difference in the other So, who did better, you or Dennis?

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Definition

2 point difference in one class may be different than a 2 point difference in the other So, who did better, you or Dennis? What you need is a standard measure Fortunately, we already know of such a measure: the standard deviation.

Remember The standard deviation is the average difference between a score (x) and the mean (M) of a distribution

Although the value of the standard deviation changes depending on the data in the distribution, a standard deviation means the same thing in any distribution.

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Definition

When you measure the number of standard deviations between your test score and the mean of your class, you can compare that difference, which is in numbers of standard deviations, but.. How do you compute your score in standard deviations? The measure of the number of standard deviations between any raw score and the mean of a distribution is a , or z-Score.

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Definition Standard (z) Scores

A standard score (z-Score) measures the number of standard deviations between a score (X) and the mean (X or µ).

X − µ z = σ and in sample

X − X z = s Let’s go back to you and Dennis

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Definition Standard (z) Scores

Let’s go back to you and Dennis Assume the standard deviation for your class is s = 1 and the standard deviation for Dennis’ class is s = 2. Remember, your score was X = 15 and Dennis’ was X = 18, Your class’ mean was M = 13 and Dennis’ class mean was M = 16. So, we have: 15 − 13 Z = = 2 you 1 and 18 − 16 Z = = 1 dennis 2 Because most scores in your class were similar to the mean, deviations from your class’ mean are unlikely, but because there is higher variability in Dennis’ class deviations from the mean are not surprising. Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Definition Of course, Dennis starts bragging and boasting about his statistics intelligence.

But who really did better on this test? YOU =) Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Characteristics Characteristics of Standard (z) Scores

1 Can be used to compare across distributions as was done in the example. unless the mean and standard deviation are identical in each distribution or if the raw scores come from the same distribution. 2 Larger standard scores indicate larger deviations (distances) from the mean. 3 Standard scores are defined as the distance between a score and its mean in standard deviations. You are 2 sd above your M, Dennis is 1sd above its M. 4 Standard scores can be positive or negative. 5 What would be the mean, variance and standard deviation of a distribution of z-scores?

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Characteristics

What would be the mean, variance and standard deviation of a distribution of z-scores? The mean of the z-distribution will be zero and the variance and standard deviation will be equal to one

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Normal distribution Standard Scores and the Normal Distribution

Two characteristics of a normal distribution: Mean = Median = Mode This implies further characteristics like Unique hump Perfectly symmetric Central tendency the proportion of scores between one standard deviation above/below the mean a is approximately 0.6826

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Normal distribution Remember, in a normal distribution most scores lie at the hump in the middle and fewer scores lie in the tails.

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Normal distribution Characteristics

1 The height of the curve reflects the probability of observing that particular score (X). As the distance between a score (X) and the mean increases, the height of the curve and the probability of observing that score decreases. 2 As you move farther and farther away from the mean in standard deviations, or z-Scores, the probability of bring that far away become less and less. 3 Regardless of the underlying data, as long as I is normally distributed these proportions/probabilities will be found for any z-score. 0.68 of the data falls over the range Mean +/- 1Sd. 4 With mean and standard deviation of a distribution, we can calculate a z-score for any raw score (the graph applies to any data).

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Normal distribution Probabilities and Proportions in the Normal Distribution

This is a fragment of a Standard Normal (z) Table you will find in the appendix of any statistics book. (Useful? what for?)

can be used to look up proportions of scores above or below z-Score within a normal distribution. Remember the example of Dennis and YOU? What share of the scores are below/above of YOURs? what share are bellow/above of Dennis?

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Normal distribution Probabilities and Proportions in the Normal Distribution

What share of the scores are below/above of YOURs? what share are bellow/above of Dennis?

P(µ ≤ X ≤ 15) And for Dennis P(µ ≤ X ≤ 18) and

P(X ≤ 15) And for Dennis P(X ≤ 18)

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Normal distribution Probabilities and Proportions in the Normal Distribution

Let’s come back to the table and review some properties Column 1 lists z-Scores that increase from 0.00 to 4.00, mainly in increments of 0.01 Col 2 proportions of scores under the normal curve that are between the z-score and the mean Col 3 Scores beyond that z-score and into the tail of the distribution. No negative z-Scores: the sign only tells you the direction of the z-score relative to the mean.

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Normal distribution Probabilities and Proportions in the Normal Distribution

What share of the scores are below/above of YOURs? what share are bellow/above of Dennis?

P(µ ≤ X ≤ 15) = P(µ ≤ Z ≤ 2) = 0.47725 And for Dennis

P(µ ≤ X ≤ 18) = P(µ ≤ Z ≤ 1) = 0.34134

and the cumulative?

P(X ≤ 15) = P(Z ≤ 2) = 0, 5 + 0.47725 = 0, 9772 And for Dennis

P(X ≤ 18) = P(Z ≤ 1) = 0, 5 + 0.34134 = 0, 84

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Normal distribution Probabilities and Proportions in the Normal Distribution

More examples:

P(X ≥ 15) =? And for Dennis P(X ≥ 18) =?

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Normal distribution Probabilities and Proportions in the Normal Distribution

More examples:

P(X ≥ 15) = 1−P(X ≤ 15) = 1−P(Z ≤ 2) = 1−0, 9772 = 0, 0228

And for Dennis

P(X ≥ 18) = 1 − P(X ≤ 18) = 1 − P(Z ≤ 1) = 1 − 0, 84 = 0, 16

Oscar BARRERA Statistics Intermediate Previous lecture Standard Scores

Normal distribution

Statistics Intermediate Normal Distribution and Standard Scores Session 3

Oscar BARRERA [email protected]

February 6, 2018

Oscar BARRERA Statistics Intermediate