CORRELATION vs CAUSATION TEACHER GUIDE Does Ice Cream Make the Sun Shine? TEACHER

CORRELATION vs CAUSATION: DOES ICE CREAM MAKE THE SUN SHINE?

Level: High School Lesson Objectives: How to work out if two variables are related Mathematics Area: and Probability How to work out if one causes change in other variable (CCSS.MATH.CONTENT.HSS.ID.C.9) How to use these skills to take better decisions in our lives

Decision Education Area: Dispositions and Knowledge (Statistical Reasoning) Part 1: Correlation This section introduces the tools we use to explore whether two Topics covered: Scatter Plots, Correlation, Causation, variables are related (scatter plots, correlation strength/type) Regression to the , Assessing Claims of Causation Part 2: Causation Delivery Time: 90 minutes (or 2 x 45 minutes). Can be This section introduces 'The Causation Checklist', highlighting shortened by discussing questions in pairs /as a group, helpful criteria we can use to investigate whether one in place of creating written answers variable actually drives the change in the other variable

Equipment required: Part 3: Regression to the Mean Pupil Guide – 1 per pupil This section explores why variables typically change anyway Computer with projector - ‘Pupil Guide’ PDF also serves as over time, with extremes returning to average, so we must be lesson presentation slides careful when suggesting a cause for such changes Dice – 1 per pupil Counters x 10 (to represent ‘speed cameras’) Part 4: Decision Scope This section provides students with the opportunity to use their Helpful Links: knowledge of correlation and causation to assess claims in Scatter Tool - https://scatterplot.online/ given case studies The Simpsons Bear Patrol Clip - https://tinyurl.com/y9htbpoc

OVERVIEW WE'LL EXPLORE...

... whether ice cream makes the sun shine, whether speed cameras reaLLY REDUCE CAR ACCIDENT NUMBERS, and how to tell if your lucky streak is about to run out

LESSON OBJECTIVES...

How to work out if two variables are relatEd how to work out if one variable CAUSES the change in the other variable how to use these skills to take better decisions in our lives

Check that students know what a VARIABLE is: a quantity that can vary over time CONTENTS :

PART 1: CORRELATION - Looking for Links

PART 2: CAUSATION - Looking for CAUSES

PART 3: REGRESSION TO THE MEAN - THE HIDDEN DRIVER

PART 4: DECISION SCOPE - APPLYING WHAT WE KNOW

SUMMARY WHAT DO WE THINK?

This activity is intended to get students thinking about why this topic is important. Students to discuss in pairs to produce written answers, then review answers as a class discussion Baseball players who hit the most home runs in the first Are people with more money happier than people with half of the season typically hit much fewer in the second less? How could we find out? half of the season. Why do you think that might be? Encourage students to share observations from their own

experience, but also to think about how to gather relevant , Students may think that a strong performance is based on skill perhaps by asking a large number of people to provide alone, so may be surprised by this. Encourage them to think information about personal income and self-reported happiness about other factors behind a strong performance (e.g. good luck)

Some superstitious sports fans (and players) have Why might it be helpful to work out if different variables bizarre rituals that they always perform before a game. are related to each other, or if one variable is even driving Why? How could we check to see if they really work? the change in the other variable?

People who perform such rituals usually believe that there is Encourage students to think more broadly about the a relationship between their behavior and the team's importance of establishing which variables are driving performance. Could be checked by stopping/changing ritual outcomes that they may care about e.g. health, education etc.

The more ice-creams sold on any given day, the hotter If you could investigate the relationship between any the temperature. Is this proof that ice cream sellers two variables, which would you pick? (e.g. meditation vs control the weather? stress, family size vs self confidence, screen time vs academic performance…?) Encourage students to see that relationships can work in Encourage students to think of an outcome that they genuinely different directions. Ask them to consider if it's more likely that care about, so that they might reflect on the importance of the weather is determining the ice cream sales establishing which variables may influence it PART 1: CORRELATION - Looking for Links Are these two variables related? Check that the students can plot coordinates. Read through steps as a class, demonstrating how the scatter graph is constructed SCATTER PLOTS Visualising a Relationship Scatter plots are used to investigate the relationship between two variables by turning data into a picture

How to Create a Scatter Plot: (1) Plot each point on the graph, using the values as coordinate points ... is plotted on (2) Draw a line through the points, which shows the the grid general direction of the points. The line should go as much through the middle of all the points as is possible. This is called the 'line of best fit'

Example Scatter Plot

Temp Ice Creams (celsius) (number) Each of these Can you describe the relationship between the temperature points... 22 52 and number of ice creams sold? Students should note that, as the temperature increases, the 15 30 number of ice creams sold also increases. Temperature is on the 18 42 This data shows 13 19 x-axis since it is the independent variable (despite the lesson title!) the temperature 19 44 and number of What could we do if we wanted to be more confident about 23 53 ice creams sold our description of this relationship? 17 35 at one location Various answers, though the simplest would be to increase 21 48 on ten different the total amount of data by making observations on a 16 35 days 18 38 greater number of days PART 1: CORRELATION - Looking for Links Are these two variables related? Review steps as a class, then support students in building their own scatter plots, in particular, with drawing the line of best fit BUILD YOUR OWN SCATTER PLOT

By following the steps outlined on the previous page, use the data below to build your own scatter plot on the empty grid.

Data for Your Scatter Plot This is data for 10 students, showing the number of times they were absent from a Math lesson during the last school year, and the score they received on the end-of-year test:

Absences Test Score (number) (percentage)

4 84 What does the scatter plot tell us about the relationship 11 70 between the number of absences and test scores? 13 68 Students should observe that as the number of absences 8 75 increases, the test scores decrease 15 62 3 87 17 55 Can you give an explanation for why the student who was 5 73 absent 5 times falls quite far below your line? 14 62 This student was rarely absent but still performed poorly. His 9 76 weak performance is likely due to general difficulties with Math, rather than poor attendance PART 1: CORRELATION - Looking for Links Are these two variables related? Review this page with the students as a class THE LANGUAGE OF CORRELATION TYPE of Correlation This describes if the variables are moving in the same How are these variables related? direction.

Correlations can be described by STRENGTH and TYPE POSITIVE

The variables move in the STRENGTH of Correlation same direction. As one This is a measure of how close the points are to falling on a increases, the other one straight line - the closer to the line, the stronger the increases too correlation.

VERY STRONG The dots are closely clustered NEGATIVE along the line The variables move in

opposite directions. As one increases, the other decreases

MODERATE NO CORRELATION The dots fall a little further from There is no relationship the line between how the variables change

PART 1: CORRELATION - Looking for Links Are these two variables related? Students to discuss in pairs, or work individually, to produce written answers, then review answers as a class DESCRIBING CORRELATIONS

Using the language on the previous page, describe the correlations shown in the following scatter plots. Remember to describe both the STRENGTH and TYPE of correlation

Students should note that, as the population size increases, the country medal score increases. Students should describe correlation as MODERATE and POSITIVE

Briefly review as class, so students are introduced to the idea

CORRELATION COEFFICIENT describe these kinds of relationships using a Students should note that, as the number of training hours value, r, which can vary from -1 to 1. increases, the average 100m time decreases. Students should Values between 0 and 1 indicate a POSITIVE correlation. describe the correlation as STRONG and NEGATIVE Values between 0 and -1 indicate a NEGATIVE correlation. Values further from 0 indicate a STRONGER correlation Values closer to 0 indicate a WEAKER correlation

PART 2: CAUSATION - Looking for CAUSES Does one variable drive the change in the other?

Review this first section with the students as a class Students discuss in pairs to produce written answers, then CORRELATION, YES. BUT CAUSATION...? review as a class Variables can be linked without one causing the PRESIDENTIAL FOOTBALL other The outcome of the Washington Redskins’ final game of the season (A) predicted if the challenger candidate CORRELATION: Variable A and Variable B are related. won the presidential election (B), in every election from 1942 to 2000. A B Do these football games determine CAUSATION: Change in Variable A causes change in who becomes president? Variable B There is no plausible mechanism by which the football A B games could influence the election. This is most likely a Students discuss in pairs to produce written answers, then review coincidence. as a class THE MIGHTY WINDMILL BUILDING A CHECKLIST The faster windmill blades rotate (A), Can you think of any other examples of two variables that the greater the strength of the wind (B). might be correlated but not be causally related?

Many possible answers. Encourage students to consider Does the rotation of windmills blades cause the wind? examples from their own experience.

What alternative explanations should we consider before Encourage students to articulate why someone might suggest suggesting a causal relationship between two variables? this, but also why it's incorrect. The windmill blades don't drive Gather initial ideas from students here, but this topic is fully the wind, the wind drives the windmill blades. explored on the next two pages. PART 2: CAUSATION - Looking for CAUSES Does one variable drive the change in the other?

Review this page with the students as a class THE CAUSATION CHECKLIST There are a number of ways that two variables can be related, without A causing B

Example REVERSE CAUSATION Winter coat usage (A) correlates with Rather than A causing B, would A B cold weather (B), but cold weather B causing A make more sense? actually causes winter coat usage

Example CONFOUNDERS A Basketball performance (A) and shoe size Might some other variable, C, C B (B) are correlated, but 'height' (C) drives actually be causing both A and B? both basketball performance and shoe size

Example COINCIDENCE From 2000-2009, the amount of cheese If there’s no reasonable connection, A eaten per person (A) correlated with the could it just be a coincidence? B number of people who died by becoming tangled in their own bedsheets (B) ....!

This is still CAUSATION, but A further consideration ... included since important idea A Example MULTIPLE CAUSES Having good friends (A) correlates with Might A be only one of many C B well-being (B), but many other factors causes of B? D (C, D...etc.) also contribute to well-being PART 2: CAUSATION - Looking for CAUSES Does one variable drive the change in the other? Students to discuss in pairs, or work individually, to produce written answers, then review answers as a class CORRELATION OR CAUSATION? More babies who sleep with a nightlight in their rooms Each of the following claims of causation is unjustified. develop vision problems ... Use THE CAUSATION CHECKLIST to write down a more so nightlights cause vision problems likely explanation for each one: Briefly mentioned so students become familiar with the idea Most likely explanation is CONFOUNDERS. The use of nightlights is Beware the Confirmation Bias! Be careful to scrutinize conclusions likely to be more common for parents with poor vision, who are also that you like as rigorously as ones you don't! Research suggests we are less critical of evidence that supports our existing beliefs more likely to have children who share their genetic vision problems

More students who use a tutor In the 1990s, the stork population of have poor academic grades ... Germany increased and the at-home so tutors damage academic birth rate also increased ... performance so storks really do deliver babies

Most likely explanation is REVERSE CAUSATION. Students who Most likely explanation is COINCIDENCE. There is no have poor grades are more likely to seek out a tutor reasonable mechanism by which the stork population could drive the birth rates.

More people die if they sleep in a The more books in a US household, hospital bed than in their own bed ... the stronger the academic performance so hospitals beds cause death of the children in the family ... so books lead to academic success

Most likely explanation is CONFOUNDERS. Illness increases Most likely explanation is CONFOUNDERS. Parents who have chance of being in a hospital bed, and also increases chance many books in the home typically place great value on learning, of death so also actively support their children with their education HOMER AND LISA on CORRELATION v s CAUSATION

Link to clip is given on the TEACHER INFORMATION page. If time, watch clip & discuss Homer's errors in statistical reasoning with the class HOMER: NOT A BEAR IN SIGHT. THE BEAR PATROL MUST BE WORKING LIKE A CHARM. LISA: THAT’S SPECIOUS REASONING, DAD. HOMER: THANK YOU, DEAR. LISA: BY YOUR LOGIC I COULD CLAIM THAT THIS ROCK KEEPS TIGERS AWAY. HOMER: OH, HOW DOES IT WORK? LISA: IT DOESN’T WORK. HOMER: UH-HUH. LISA: IT’S JUST A STUPID ROCK. HOMER: UH-HUH. LISA: BUT I DON’T SEE ANY TIGERS AROUND, DO YOU? [HOMER THINKS OF THIS, THEN PULLS OUT SOME MONEY] HOMER: LISA, I WANT TO BUY YOUR ROCK. [LISA REFUSES AT FIRST, THEN TAKES THE EXCHANGE] PART 3: REGRESSION TO THE MEAN - THE HIDDEN DRIVER Was that change going to happen anyway? Review this first section with the students as a class Discuss the following questions as a class

CHANGE WITHOUT A CAUSE ACCIDENT RECORD W hen we work to change something and it then improves, STREET NAME: we might be taking credit for a change that would have ACCIDENTS YEAR 1: happened anyway. The role of luck leads to variables we measure going up and down even without a cause, ACCIDENTS YEAR 2: though we are often tempted to search for a story to explain why. What happened to the number of accidents on the DANGER-STREETS after the speed cameras were THE SPEED TRAP TEST introduced? Students should observe that the number of accidents on Do Speed Cameras Really danger-streets drops from Year 1 to Year 2 Reduce Accident Numbers? Use COUNTERS to represent SPEED CAMERAS Were the speed cameras really responsible for the change in accident numbers? How to play: No. Most likely result in Year 2 is a mid- number, which will (1) Using the ACCIDENT RECORD box, write down a be viewed as a drop from the high number in the first year street name you know, and then stand up. (2) Your teacher will now give everyone a die. Roll your What would happen if you gave speed cameras to the die twice and add the scores. This is the number of safest streets (scores 2, 3, or 4)? Play again to find out! ACCIDENTS in Year 1 on your street. Again, since mid-range numbers are most likely, we would expect (3) If you score 9 or less, you have a SAFE-STREET. Sit the number of accidents on safest street in Year 2 to INCREASE down. You don't need a speed camera. (4) If you score 10,11,12, you have a DANGER-STREET. What does this activity tell us about the effectiveness of Your teacher will now give you a speed camera, and speed cameras? ask you repeat Step 2 to find the number of accidents It tells us accident numbers would probably fall just by chance, on your street in the following year. so we actually don’t know if speed cameras are effective or not

PART 3: REGRESSION TO THE MEAN - THE HIDDEN DRIVER Was that change going to happen anyway? Review the first two sections with the students as a class, reading the story aloud. For the questions, work individually, or in pairs, to produce written answers, then discuss as a class PERFOMANCE, SKILL & LUCK THE MATHEMATICS OF HIGH PERFORMANCE Most outcomes are a result of two main factors - skill The speed trap test was based purely on and luck random chance, but do we see the same effect with elite performance, where skill is involved? PERFORMANCE = SKILL + LUCK

In his book 'Thinking Fast & Slow' psychologist Daniel Kahneman tells a story about performance. He was explaining to instructors who teach pilots that praise works better than punishment. SKILL consistent but...... LUCK is not However, one of the most experienced instructors tells him that he's wrong. The instructor explains: GREAT PERFORMANCE: POOR PERFORMANCE: GOOD SKILL + GOOD LUCK POOR SKILL + BAD LUCK

"On many occasions I have praised flight luck likely to change, luck likely to change, cadets for clean execution of some aerobatic so performance will so performance will maneuver. The next time they try the same probably dip to probably improve to maneuver, they usually do worse. On the other average average hand, I have often screamed into a cadet's How had the aircraft instructor misunderstood the earphone for bad execution, and in general he impact of his teaching methods? does better on his next try. So please don't tell Students should understand that he believed his words were us that reward works and punishment does not, because the opposite is the case." responsible for the following changes in performance

Does this story suggest that one teaching tactic (punishment or praise) is more effective than the other? No. We expect extreme performance to be followed by average performance, so we can't assess the effectiveness of either tactic PART 3: REGRESSION TO THE MEAN - THE HIDDEN DRIVER Was that change going to happen anyway? Read each scenario aloud. Students to discuss in pairs, or work individually, to produce written answers. Review answers as a class THE SPORTS ILLUSTRATED CURSE DID THAT REALLY HELP?

In 2002, Sports Illustrated Imagine the following magazine published an story. After a terrible set of article, highlighting the exam results, a school excessive bad luck principal introduces a new experienced by many policy of school uniform sports stars after for all students. The appearing on their front following year’s exam cover. The winning streaks results shows a clear of many top athletes have improvement. come to an abrupt end following this high profile accolade.

In terms of performance, what do you think athletes Can the principal justifiably claim that the new uniform invited to appear on the magazine cover have in common? policy worked? Students should suggest that all sports stars invited to be on the The policy was introduced when test scores were unusually low, cover will have been performing at the very highest level in their so they would most likely have increased in the following year respective sport in the period before receiving their invitations anyway, again due to regression to the mean

Can you think of another explanation for 'The Sports What could the principal have done differently to really Illustrated Curse'? test the effectiveness of his uniform policy? Since good performance entails good skill and good luck, and luck He could have introduced the policy after an average set of is likely to change, their performance is likely to dip anyway. The results. Alternatively, he could have made only one of two classes curse is most probably an example of regression to the mean adopt the policy, thus 'controlling' for regression to the mean PART 4: DECISION SCOPE - APPLYING WHAT WE KNOW How can this help us make better decisions? Read each of the three scenarios aloud. Students to discuss in pairs, or work individually, to produce written answers. Review answers as a class APPLYING OUR UNDERSTANDING For what kind of situations might these ideas be useful? CASE STUDY 2: ALTERNATIVE REMEDIES In each of the following Case Studies, certain claims are made. Use your new understanding of correlation and causation to indicate if the claims are justifiable, ‘I was feeling really sick and I explaining your reasons carefully. had tried all the medicines I usually use. A friend CASE STUDY 1: FAMILY MEALS recommended drinking ground-up apricot seeds, which sounded pretty ‘I read that kids from ridiculous, but I was ready families that eat meals to try anything. I drank the together 3 or more times tea and 2 days later I was per week are more likely to back to full health.’ perform better at school, and even have better relationships with their parents’ Students should note that REGRESSION TO THE MEAN may be

Students should note that CONFOUNDERS may well explain this the hidden driver in this scenario. The protagonist sought a new

correlation. Responsible and attentive parents are probably more remedy when his health was much worse than average. In most

likely to organize regular family meals, and are also probably cases, he would soon return to average health anyway, but the

more likely to take an active interest in their children's academic timing of the intervention makes it appear that the apricot seeds

progress were the cause of the improvement. PART 4: DECISION SCOPE - APPLYING WHAT WE KNOW How can this help us make better decisions? Read aloud and discuss as a class CASE STUDY 3: PIZZA DEFEATS CANCER THE LIMITS OF DATA

The Causation Checklist can only help increase our 'I read about a new study confidence that two variables are causally related. In fact, which found that eating we can never be 100% certain that A causes B. This is pizza can prevent cancer. because Mathematics and Statistics work differently. The study, which involved 12 people from 12 to 68 Mathematics reaches conclusions through DEDUCTION. years of age, reported how This that we accept certain basic ideas to be true many pizzas they ate each (axioms), and we connect them using logic to show that month. Cancer rates were other ideas must then also be true. lowest in the participants Statistics reaches conclusions through INDUCTION. This who ate the most means observations (data) about the world are recorded pizza.’ and then analyzed to identify patterns in the data.

However, we can never be sure of what data we are Students should note that CONFOUNDERS may also explain this missing. Europeans used to believe that all swans were correlation. Study participants that eat the most pizza are white, until 1697 when Dutch explorers sighted Black Swans in Australia. Just probably also the youngest participants, and younger people are because all swan data you have suggests much less likely to develop cancer. So AGE may well be a swans are white, the conclusion is wrong because of missing black swan data. confounder, independently driving both lower pizza consumption In the same way, even though we might and higher cancer rates. Also, students should note the very be confident that A is causing B, we can small sample size (only 12 people), which should reduce our never be certain that we are not about C, which may be the real driver. confidence considerably in the study's findings.

SUMMARY

Review aloud with the class, clarifying any remaining misconceptions

THE CAUSATION CHECKLIST Two variables are CORRELATED if they are related to each other REVERSE CAUSATION Rather than A causing B, would A B Two variables are CAUSALLY related if a B causing A make more sense? change in one variable drives the change in the other variable CONFOUNDERS A Might some other variable, C, The CAUSATION CHECKLIST can be used to C actually be causing both A and B? B eliminate alternative explanations for correlations COINCIDENCE If there’s no reasonable connection, A Outcomes are typically partly due to could it just be a coincidence? B random chance

MULTIPLE CAUSES A Extremely high numbers usually come Might A be only one of many C B down and extremely low numbers go up. causes of B? D

If an intervention is introduced when the numbers are at an extreme, the intervention REGRESSION TO THE MEAN may appear to have an effect even if it If B was at an extreme value, it may A B didn’t. change without the introduction of A ... added to checklist REVIEW: WHAT DO WE THINK NOW?

Students should tackle these final questions independently, producing written answers, which can be reviewed later to assess understanding

Answer the following questions to check you have List 3 alternative explanations to consider, before understood all the important ideas from this lesson: making a claim of causation: 1 - Reverse Causation 2 - Confounders 3 - Coincidence Why is it helpful to know if two variables are related? If we want to influence a certain outcome B, it is useful to know When a claim is made that a certain intervention has which other variables it is related to (A, C, D etc.). These made improvements, what trap should you check for? correlations can be further investigated to establish causation, Regression to the mean. It is also important to know if the which may then suggest new ways to influence outcome B intervention was introduced when the variable was at an extreme value What does it mean if two variables are ‘correlated’? It means that there is a relationship between the two Why do many variables naturally return from extreme values towards the average? variables and they behave in a consistent way relative to Many variables are partly due to chance, which is not consistent. each other Good luck and bad luck will soon be followed by average luck, so extreme values typically soon return to average values What does it mean if two variables are ‘causally’ related? So, does ice cream make the sun shine? It means that a change in one variable actually causes the Ice cream sales and hours of sunshine are correlated, but this change in the other variable. claim is an example of reverse causation. In reality, an increase in sun shine causes an increase in ice cream sales