Cherry-Picking Did They Tell the Whole Truth? • Cherry-Picking: Choosing Data Points Or Data Presentations That Support One I
Total Page:16
File Type:pdf, Size:1020Kb
Chapter 2: Design and Validity Cherry-picking – Did they Tell the Whole Truth? ñ Cherry-picking: Choosing data points or data presentations that support one’s conclusion while ignoring other data points and presentations that don’t. ñ When you pick cherries (or any other fruit), you pick the best ones. It’s not a representative sample of what all the cherries look like. Same idea in the realm of data science: What data is being presented and what data is being left out of this picture? Is there a particular perspective or particular visual being used to tell one side of the story only? ñ Misleading Visualizations o To understand cherry-picking, let’s consider this example reporting job losses. Gaslowitz. https://ed.ted.com/lessons/how-to-spot-a-misleading-graph-lea-gaslowitz o There are actually several issues here: First issue is that the scale itself is not consistent. The space between March 2009 and June 2010 is much to small relative to the rest of the graph. This is a 15-month span, while the other (larger) spans only cover 6 or 9 months. Something’s fishy! Second, why are only these four values being reported? What about the other quarterly job loss values in between? o When we go back to the data source and try to create a more complete picture, the story changes. The original graph displayed a steady increase in quarterly job losses that suggests things are still getting worse. This complete picture shows a more stabilizing picture. Gaslowitz. https://ed.ted.com/lessons/how-to-spot-a-misleading-graph-lea-gaslowitz Findley & Nguyen (2020), University of Illinois – Urbana-Champaign Page 38 Chapter 2: Design and Validity ñ Making claims Based only on anecdotal evidence o Cherry picking can also take place when we are only drawing from a limited pool of knowledge. o We have to make about a thousand or more little decisions a day, and we don’t always have time to conduct a survey, crunch some numbers, and write a report about what we should do. o Oftentimes, we rely on anecdotal evidence (our own experience or the experiences of those in our social circles), to decide what to do. Unfortunately, anecdotal evidence is limited and can is essentially just a form of cherry-picking, even if not used to intentionally deceive. o There’s nothing wrong with that…until we start to make big decisions or weigh in on complicated conversations using only anecdotal evidence. Consider this example: Let’s say I get a flu vaccine, but then contract the flu that season. The next season, I don’t get a flu vaccine and don’t get the flu. I decide that flu vaccines are probably not that effective and don’t bother any more. Am I making a data-driven decision? Yes, but my limited, anecdotal experiences are not a particularly strong here. The same argument applies when it’s someone I know sharing information! o Be cautious when treading into complicated discussions armed only with data from a sample of yourself or those you know, and recognize when that is the basis for other claims. ñ Searching for answers under the cloud of a confirmation Bias o Our desire to see what we want to see can lead us to be confirmers rather than investigators. We like to consider only evidence that already aligns with our views, or provides a conclusion that we have decided we like. We give more credence to people who we view as already being on “our side.” o This can happen for many reasons, but one common reason is because we don’t like to have cognitive dissonance, meaning that grappling with a difficult question and being unsure of the answer is mentally taxing! o If I don’t like to take vitamins, then two things might happen: I might give more credence to research I come across that downplays the value of taking vitamins. I might be more inclined to look for research that downplays the value of taking vitamins, thus, distorting the breadth of data I take in on this subject! o One should always ask themselves whether their biases are truly getting in the way of looking at the available data as objectively as possible. Final Thoughts on Using Data Appropriately ñ Data can Be a false authority o As noted earlier in this section, just because someone uses data in their argument does not mean that what follows must be correct or sensible. o Data can be misunderstood, manipulated, or even collected under faulty designs, and it is up to you as a critical consumer to decide if you are getting the whole truth. o At the end of the day, analysts are making judgments in their design, their presentation, and the storyline that accompanies them. Your job is to identify where they made judgments! Findley & Nguyen (2020), University of Illinois – Urbana-Champaign Page 39 Chapter 2: Design and Validity ñ Asking Good Questions o Consider the points we have covered in this unit and how we might turn them into questions to evaluate the studies undergirding statistical claims we encounter: Study Evaluation Checklist Is this claim based on an oBservational study or an experiment? And which type? If an experiment, how strong is the internal validity of the study? (Any threats to a causal chain?) If an observational study, what confounders could exist, and were they explored well in the data? How strong is the external validity of the study (do the conclusions generalize to a wider population, setting, and time?) Have the authors cherry-picked data to support their claim? (Graphs and tables are clearly and honestly represented, all pertinent information is available, if a drug study – all side effects are reported!) Does the argument stem from an objective data source, or simply anecdotal evidence? Might these authors have any confirmation Bias? (Already biased toward a certain conclusion? Who sponsored or funded this study?) And as a final consideration… am I ready to make an important decision from this data? What further research remains to Be done? The heart of good statistical investigation and critical data consumption is good questions! Poke holes in the claims you read. Consider what agenda might Be motivating this message. Ask what’s left out. Pursue the truth, the whole truth, and nothing But the truth. Practice: A toothpaste company reaches out toO 500 dentists to see if they would be willing to recommend their toothpaste. 54 respond, of which 42 said they would recommend. The toothpaste company thus publishes the claim that 4 out of 5 dentists recommend their toothpaste. What is problematic aBout this claim? rate ! Low response 54/500 responded Practice: Emergency Rooms frequently report higher numbers of patients on major holidays like Christmas or the 4th of July. I decide to use this as evidence that people are often depressed over the holidays and are more likely to engage in risky behavior as a result. What is wrong with my logic? more events = more activities/ Christmas/ 4th of July to → happen accidents likely Is ? it true for every holiday across many years Findley & Nguyen (2020), University of Illinois – Urbana-Champaign Page 40 Chapter 2: Design and Validity Practice: I’ve always been concerned that there might be something unhealthy about protein bars. Then I came across a well-designed randomized control trial study that said that consuming too much protein a day can actually cause some negative health effects for some people above age 45. I now tell college students in my courses to avoid protein bars. What is wrong with my logic? out studies that → look for are initial belief = protein bars unhealthy confirm belief t 45 → 't college students ¥ age can generalize Consider these two fictional studies: Study 1 Study 2 Bayer Pharmaceuticals made the claim that patients The American Heart Association made the claim that taking Bayer Aspirin daily were 29% less likely to patients with heart disease who took Tylenol daily have a heart attack over a 5 year period as were living almost 0.9 year longer on average than compared to patients taking Tylenol. those taking Bayer. The study involved a randomized control trial where The study was based on a large report of medical half were prescribed Bayer and half were prescribed records released across the last 30 years. Tylenol. Participants included anyone who was prescribed to 65+ old patients at St. Luke’s Hospital in New York take Tylenol or Bayer daily within 10 years of death City who were prescribed aspirin for heart health and included 11,563 people total who had passed were approached. 1187 agreed to participate. away. What limitations, questions, or concerns do you note with each study? * 65 told patients . observational study • to location who was specific . prescribed anyone take Tylenol or Bayer daily death within W years of years of Which study do you Believe has a more trustworthy conclusion? study I Findley & Nguyen (2020), University of Illinois – Urbana-Champaign Page 41 .