Statistical Analysis: Spearman S Rank Correlation Coefficient

Total Page:16

File Type:pdf, Size:1020Kb

Statistical Analysis: Spearman S Rank Correlation Coefficient

Statistical Analysis: Spearman’s Rank Correlation Coefficient

The Rank Correlation test is a method of comparing two sets of data. The data you will be comparing are called VARIABLES. Variables are things, which can change. In the example we will use, the variables we will compare will be students’ term marks and the number of hours of homework students do each week. The Rank Correlation test attempts to show if there is a RELATIONSHIP between the two sets of variables.

Example: STEP 1: Develop a hypothesis. The more hours of homework a student does each week the higher his/her term mark will be. Data:

Student Term Mark # of hours of Homework/Week Andrea 63 3 Ben 79 11 Rebecca 53 8 Jamie 77 10 Emily 89 13 Darryl 66 5 Joseph 64 4 Eleanor 91 14 Jacob 71 5 Sharon 74 9

STEP 2: Rank the variables. (*Always rank the highest number rank # 1 for both sets of data)

Student Term Mark Rank Hours of Rank Homework/wk Andrea 63 9 3 10 Ben 79 3 11 3 Rebecca 53 10 8 6 Jamie 77 4 10 4 Emily 89 2 13 2 Darryl 66 7 5 7 Joseph 64 8 4 9 Eleanor 91 1 14 1 Jacob 71 6 5 7 Sharon 74 5 9 5

Note: When a TIE occurs (such as Homework hours for Darryl and Jacob) both are given the same rank (the higher rank of the two/lower number) (7) and the next rank is 9 (Joseph). If THREE variables have the same ranking, then the MIDDLE rank is given to all three of them. For example, if three variables had the same value and came fifth, sixth and seventh on your list, they would all be ranked 6 and the next rank would be 8.

STEP 3: Find the difference between the rankings. Student Term Mark Rank Hours of Rank Difference Homework/wk Andrea 63 9 3 10 -1 Ben 79 3 11 3 0 Rebecca 53 10 8 6 4 Jamie 77 4 10 4 0 Emily 89 2 13 2 0 Darryl 66 7 5 7 0 Joseph 64 8 4 9 -1 Eleanor 91 1 14 1 0 Jacob 71 6 5 7 -1 Sharon 74 5 9 5 0

STEP 4: Square the difference and find the sum.

Student Term Mark Rank Hours of Rank Difference Difference 2 Homework/wk (d) (d x d) Andrea 63 9 3 10 -1 1 Ben 79 3 11 3 0 0 Rebecca 53 10 8 6 4 16 Jamie 77 4 10 4 0 0 Emily 89 2 13 2 0 0 Darryl 66 7 5 7 0 0 Joseph 64 8 4 9 -1 1 Eleanor 91 1 14 1 0 0 Jacob 71 6 5 7 -1 1 Sharon 74 5 9 5 0 0

* Sum () of d2 = 19

STEP 5: Apply the appropriate figures to the formula.

In the equation: R = 1 - 6  d 2 R = The rank correlation coefficient n 3 – n  = the sum of n = the number of variables being correlated

R = 1 - 6 (19) 103 – 10

R = 1 - 114 1000 – 10

R = 1 - 114 990

R = 1 - 0.115

R = + 0.885 (* Your answer should ALWAYS be between –1 and +1) STEP 6: Explaining what the rank correlation coefficient means. Completing the correlation is only part of the task. Once you have the answer you must explain what the answer is saying. You will also accept or reject the hypothesis you proposed in STEP 1. In our example we would accept the hypothesis because there is a direct relationship with a rank value of +0.885. Then, give reasons to support or reject your hypothesis. *The ability to state your results is more important than the math you use to get your results.

The number indicates the extent to which the two variables relate to each other. Variables relate to one another in one of three ways:

1. A DIRECT RELATIONSHIP

This means that both variables move in the same direction. That is, as one variable increases in value or frequency, so does the other….or if one variable decreases in value, so does the other. In the example used above, we found a direct relationship in that as one variable increased (the students marks), so did the other variable (number of hours of homework). A direct relationship will always have a value between 0 and +1. The closer the coefficient is to +1, the stronger the relationship. For example, if the coefficient is +0.5, it means that there is a direct relationship between the variables in 50% of the cases. If the relationship is +0.8, it means that there is a direct relationship between the variables in 80% of the cases, and so on. A relationship which is close to zero indicates that while a relationship occurs between the variable, it is very weak. If the coefficient is close to +1, it indicates that a very strong relationship occurs. Thus, in our example above, because there is a correlation coefficient of +0.885, there is a very strong relationship between a students marks and the number of hours of homework they do.

2. AN INVERSE RELATIONSHIP

This means that the two variables move in opposite directions. For example, we might find that as a person drinks more alcohol, the amount of time they are able to concentrate on a task decreases. Thus, as one variable increases (the number of drinks), the other variable decreases (the number of minutes they are able to concentrate on a task). An inverse relationship will always have a value between 0 and –1. If the coefficient is close to 0, a weak relationship is indicated, or if it is near –1, a very strong relationship exits.

3. NO SIGNIFICANT RELATIONSHIP

If the coefficient is near 0 (either + or - ), it means that the two variables do not relate to each other at all. For example, we may wish to compare students marks with their birth order (ie: whether they were first born in their family, second born and so on). If we found that the coefficient was +0.002, then it would indicate that there is no real relationship between a students marks and their birth order.

How can we use Rank Correlation?

Basically, we can use rank correlation to make predictions and generalizations. Using the example of students marks and hours of homework, we found that from our sample of ten students, there is a strong direct relationship between the number of hours of homework a student does each week and their marks. If we were now to be introduced to a student who does 20 hours of homework each week (more than any of the students in the sample), we could make a prediction that they would make very high marks. If our sample truly represents all students, there is about an 89% chance of being correct (since the coefficient was +0.885). Obviously, the closer the coefficient is to 0, the less reliable our prediction would be. It is important to note that the Rank Correlation Coefficient does not necessarily indicate a cause-effect relationship. That is, while students marks seem to be quite closely related to the number of hours of homework they do, doing lots of homework may not cause a student to get high marks or it may be only one of many reasons why a student gets high marks. -1.0 = A perfect inverse correlation. As one variable increases the other decreases and vice versa. -0.8 = A very strong inverse correlation. It is most likely that as one variable increases the other decreases or vice versa. -0.5 = A moderately strong inverse correlation. There is a 50% chance that as one variable increases the other will decrease or vice versa. -0.3 = A weak inverse correlation. There is a slight chance that as one variable increases the other will decrease and vice versa. 0 = There is absolutely no correlation between the variables. +0.3 = There is a weak direct correlation. There is a slight chance that as one variable increases so will the other or if one decreases so will the other. +0.5 = There is a moderate direct relationship. There is a 50% chance that as one variable increases so will the other or if one decreases so will the other. +0.8 = A very strong direct correlation. It is most likely that as one variable increases so will the other or one decreases so will the other. +1.0 = A perfect direct correlation. As one variable increases so will the other or as one decreases so will the other.

PRACTICE: Variables: 1)___% of GNP spent on Education______2)____Average correct answers on Science Testing_____

Hypothesis:______

Country % of GNP Rank Average Rank D D2 spent on Correct (difference Education answers on between Science ranks) testing Canada 7.4 69 Spain 3.2 68 Scotland 5.2 68 Korea 4.5 78 Italy 4.0 70 US 7.5 67 Switzerland 4.8 74 France 6.1 69 Russia 7.0 71 Jordan 7.1 57 Hungary 5.7 73 Israel 10.2 70

Rank Correlation = ______

Analysis: What does the number mean? Does it support your hypothesis? Why or why not? ______

Recommended publications