<<

Chapter 2: Design and Validity Cherry-picking Did they Tell the Whole Truth? Cherry-picking: Choosing data points or data presentations that support one’s conclusion while ignoring other data points and presentations that don’t. When you pick cherries or any other fruit, you pick the best ones. It’s not a representative sample of what all the cherries look like. Same idea in the realm of data science: What data is being presented and what data is being left out of this picture? Is there a particular perspective or particular visual being used to tell one side of the story only? Misleading Visualizations o To understand cherry-picking, let’s consider this example reporting job losses.

Gaslowitz. https://ed.ted.com/lessons/how-to-spot-a-misleading-graph-lea-gaslowitz

o There are actually several issues here: First issue is that the scale itself is not consistent. The space between March 2009 and June 2010 is much to small relative to the rest of the graph. This is a 15-month span, while the other larger spans only cover or months. Something’s fishy! Second, why are only these four values being reported? What about the other quarterly job loss values in between? o When we go back to the data source and try to create a more complete picture, the story changes. The original graph displayed a steady increase in quarterly job losses that suggests things are still getting worse. This complete picture shows a more stabilizing picture.

Gaslowitz. https://ed.ted.com/lessons/how-to-spot-a-misleading-graph-lea-gaslowitz

Findley & Nguyen (2020), University of Illinois Urbana-Champaign Page 38

Chapter 2: Design and Validity Making claims based only on o can also take place when we are only drawing from a limited pool of knowledge. o We have to make about a thousand or more little decisions a day, and we don’t always have time to conduct a survey, crunch some numbers, and write a report about what we should do. o Oftentimes, we rely on anecdotal evidence (our own experience or the experiences of those in our social circles), to decide what to do. Unfortunately, anecdotal evidence is limited and can is essentially just a form of cherry-picking, even if not used to intentionally deceive. o There’s nothing wrong with that…until we start to make big decisions or weigh in on complicated conversations using only anecdotal evidence. Consider this example: Let’s say I get a flu vaccine, but then contract the flu that season. The next season, I don’t get a flu vaccine and don’t get the flu. I decide that flu vaccines are probably not that effective and don’t bother any more. Am I making a data-driven decision? Yes, but my limited, anecdotal experiences are not a particularly strong here. The same applies when it’s someone I know sharing information! o Be cautious when treading into complicated discussions armed only with data from a sample of yourself or those you know, and recognize when that is the basis for other claims. Searching for answers under the cloud of a o Our desire to see what we want to see can lead us to be confirmers rather than investigators. We like to consider only evidence that already aligns with our views, or provides a conclusion that we have decided we like. We give more credence to people who we view as already being on our side. o This can happen for many reasons, but one common reason is because we don’t like to have cognitive dissonance, meaning that grappling with a difficult question and being unsure of the answer is mentally taxing! o If I don’t like to take vitamins, then two things might happen: I might give more credence to research I come across that downplays the value of taking vitamins. I might be more inclined to look for research that downplays the value of taking vitamins, thus, distorting the breadth of data I take in on this subject! o One should always ask themselves whether their biases are truly getting in the way of looking at the available data as objectively as possible. Final Thoughts on Using Data Appropriately Data can be a false authority o As noted earlier in this section, just because someone uses data in their argument does not mean that what follows must be correct or sensible. o Data can be misunderstood, manipulated, or even collected under faulty designs, and it is up to you as a critical consumer to decide if you are getting the whole truth. o At the end of the day, analysts are making judgments in their design, their presentation, and the storyline that accompanies them. Your job is to identify where they made judgments!

Findley & Nguyen (2020), University of Illinois Urbana-Champaign Page 39

Chapter 2: Design and Validity Asking Good Questions o Consider the points we have covered in this unit and how we might turn them into questions to evaluate the studies undergirding statistical claims we encounter:

Study Evaluation Checklist Is this claim based on an observational study or an experiment? And which type? If an experiment, how strong is the internal validity of the study? (Any threats to a causal chain?) If an observational study, what confounders could exist, and were they explored well in the data? How strong is the external validity of the study (do the conclusions generalize to a wider population, setting, and time?) Have the authors cherry-picked data to support their claim? (Graphs and tables are clearly and honestly represented, all pertinent information is available, if a drug study all side effects are reported!) Does the argument stem from an objective data source, or simply anecdotal evidence? Might these authors have any confirmation bias? (Already biased toward a certain conclusion? Who sponsored or funded this study?) And as a final consideration… am I ready to make an important decision from this data? What further research remains to be done?

The heart of good statistical investigation and critical data consumption is good questions! Poke holes in the claims you read. Consider what agenda might be miaing hi meage Ak ha lef Pe he h he hle h and nothing but the truth.

Practice: A toothpaste company reaches out toO 500 dentists to see if they would be willing to recommend their toothpaste. 54 respond, of which 42 said they would recommend. The toothpaste company thus publishes the claim that 4 out of 5 dentists recommend their toothpaste. What is problematic about this claim? rate ! Low response 54/500 responded

Practice: Emergency Rooms frequently report higher numbers of patients on major holidays like Christmas or the 4th of July. I decide to use this as evidence that people are often depressed over the holidays and are more likely to engage in risky behavior as a result. What is wrong with my logic? more events = more activities/ Christmas/ 4th of July to → happen accidents likely Is ? it true for every holiday across many years Findley & Nguyen (2020), University of Illinois Urbana-Champaign Page 40

Chapter 2: Design and Validity Practice: I’ve always been concerned that there might be something unhealthy about protein bars. Then I came across a well-designed randomized control trial study that said that consuming too much protein a day can actually cause some negative health effects for some people above age 45. I now tell college students in my courses to avoid protein bars.

What is wrong with my logic? out studies that → look for are initial belief = protein bars unhealthy confirm belief t 45 → 't college students ¥ age can generalize Consider these two fictional studies:

Study 1 Study 2 Bayer Pharmaceuticals made the claim that patients The American Heart Association made the claim that taking Bayer Aspirin daily were 29% less likely to patients with heart disease who took Tylenol daily have a heart attack over a 5 year period as were living almost 0.9 year longer on average than compared to patients taking Tylenol. those taking Bayer.

The study involved a randomized control trial where The study was based on a large report of medical half were prescribed Bayer and half were prescribed records released across the last 30 years. Tylenol. Participants included anyone who was prescribed to 65+ old patients at St. Luke’s Hospital in New York take Tylenol or Bayer daily within 10 years of death City who were prescribed aspirin for heart health and included 11,563 people total who had passed were approached. 1187 agreed to participate. away.

What limitations, questions, or concerns do you note with each study?

* 65 told patients . observational study

• to location who was specific . prescribed anyone take Tylenol or Bayer daily death within W years of years of

Which study do you believe has a more trustworthy conclusion?

study I

Findley & Nguyen (2020), University of Illinois Urbana-Champaign Page 41