<<

Research Skills Seminar Series 2019 CAHS Research Education Program

Statistical Tips for Interpreting Scientific Claims

Mark Jones , Telethon Kids Institute

18 October 2019

Research Skills Seminar Series | CAHS Research Education Program Department of Child Health Research | Child and Adolescent Health Service

ResearchEducationProgram.org © CAHS Research Education Program, Department of Child Health Research, Child and Adolescent Health Service, WA 2019

Copyright to this material produced by the CAHS Research Education Program, Department of Child Health Research, Child and Adolescent Health Service, Western Australia, under the provisions of the Copyright Act 1968 (C’wth Australia). Apart from any fair dealing for personal, academic, research or non-commercial use, no part may be reproduced without written permission. The Department of Child Health Research is under no obligation to grant this permission. Please acknowledge the CAHS Research Education Program, Department of Child Health Research, Child and Adolescent Health Service when reproducing or quoting material from this source. Statistical Tips for Interpreting Scientific Claims

CONTENTS:

1 PRESENTATION ...... 1 2 ARTICLE: TWENTY TIPS FOR INTERPRETING SCIENTIFIC CLAIMS, SUTHERLAND, SPIEGELHALTER & BURGMAN, 2013 ...... 15 3 ADDITIONAL RESOURCES – STATISTICAL TIPS FOR INTERPRETING SCIENTIFIC CLAIMS ...... 18 3.1 STATISTICAL ...... 18 3.2 CORRELATIONS ...... 18 3.3 ...... 19 3.4 REFERENCES ...... 19 RESEARCH SKILLS SEMINAR SERIES 2019 CAHS Research Education Program Statistical Tips for Interpreting Scientific Claims Statistical Tips for Interpreting Scientific Claims Mark Jones|Biostatistician, Telethon Kids Institute [email protected]

On behalf of Dr Julie Marsh Snr Research Fellow, Telethon Kids Institute

Research Skills Seminar Series | CAHS Research Education Program Department of Child Health Research | Child and Adolescent Health Service Sutherland, Spiegelhalter and Burgman (2013). Twenty tips for interpreting scientific claims. Nature 503 (335) ResearchEducationProgram.org 2

The 20 Tips The 20 Tips

1. Differences and Chance Cause Variation 13. Significance is Significant 1. Differences and Chance Cause Variation 13. Significance is Significant 2. No Measurement is Exact 14. Separate No Effect from Non‐Significance 2. No Measurement is Exact 14. Separate No Effect from Non‐Significance 3. Bias is Rife 15. Effect Size Matters 3. Bias is Rife 15. Effect Size Matters 4. Bigger is usually Better for Sample Size 16. Study Relevance Limits Generalisation 4. Bigger is usually Better for Sample Size 16. Study Relevance Limits Generalisation 5. Correlation Does Not Imply Causation 17. Feelings Influence 5. Correlation Does Not Imply Causation 17. Feelings Influence Risk Perception 6. Regression to the Mean Can Mislead 18. Dependencies change the risks 6. Regression to the Mean Can Mislead 18. Dependencies change the risks 7. Extrapolating Beyond the Data is Risky 19. Data can be dredged, or cherry picked. 7. Extrapolating Beyond the Data is Risky 19. Data can be dredged, or cherry picked. 8. Beware the Base‐Rate 20. Extreme measurements may mislead. 8. Beware the Base‐Rate Fallacy 20. Extreme measurements may mislead. 9. Controls are Important 9. Controls are Important 10. Randomisation minimises bias 10. Randomisation minimises bias Study Design 11. Seek Replication 11. Seek Replication Statistical Insights 12. Scientists are Human 12. Scientists are Human Human Factors

3 4

1 Randomisation (minimises bias) Randomization avoids bias EXPERIMENTAL UNIT ‐ Population smallest ‘thing’ to which Part 1: Study Design Treatment Group we allocate treatments.

Treatment

Randomised Allocation Placebo

Simple Random Control Group Sample

Source: Final Fantasy VII 5

Replication Controls are Important Seek Replication, not pseudoreplication

“Results consistent across many studies, replicated on independent populations, are more likely to be solid.”

“The results of several such experiments may be combined in a meta‐analysis to provide an overarching view of the topic with potentially much greater statistical power than any of the individual studies.”

7 8

2 Generalisability (aka external validity) Generalisability (aka external validity) Study relevance limits generalizations Study relevance limits generalizations

“At its best a trial shows what can be accomplished with a medicine under careful observation and certain restricted conditions. The same results will not invariably or necessarily be observed when the medicine passes into general use.”

Austin Bradford Hill, 1984

9 10

Sample Size (adopt the goldilocks principle) Bigger is usually better for sample size

Part 2: Statistical Insights Average efficacy can be more reliably and accurately estimated from a study with hundreds of participants than from a study with only a few participants.

Basic analyses often assume that participants are homogenous.

12 11

3 Sample Size (adopt the goldilocks principle) Differences and chance cause variation Bigger is usually better for sample size “The main challenge of research is teasing apart the importance of the process of interest from the innumerable other sources of variation.”

too small too big just right

13 14

Differences and chance cause variation Differences and chance cause variation

“The main challenge of research is teasing apart the importance of the process of “The main challenge of research is teasing apart the importance of the process of interest from the innumerable other sources of variation.” interest from the innumerable other sources of variation.”

15 16

4 Hypotheses, Effects and Significance Hypotheses, Effects and Significance Significance is significant, Separate no effect from non‐significance, Effect size matters Significance is significant, Separate no effect from non‐significance, Effect size matters

Hypothesis Test is a statistical technique to assist with answering questions or Hypothesis Test is a statistical technique to assist with answering questions or claims about population parameters by sampling from the population of interest. claims about population parameters by sampling from the population of interest.

P‐values give the probability that an observed association could have arisen solely P‐values give the probability that an observed association could have arisen solely as a result of chance bias in sampling, given the null hypothesis –it is dependent as a result of chance bias in sampling, given the null hypothesis –it is dependent on sample size and variability. on sample size and variability.

p‐value=0.01 means there is a 1‐in‐100 probability that what looks like an effect of p‐value=0.01 means there is a 1‐in‐100 probability that what looks like an effect of the treatment could have occurred randomly, and in truth there was no effect at the treatment could have occurred randomly, and in truth there was no effect at all. all.

17 18

Hypotheses, Effects and Significance Hypotheses, Effects and Significance Significance is significant, Separate no effect from non‐significance, Effect size matters Significance is significant, Separate no effect from non‐significance, Effect size matters

Hypothesis Test is a statistical technique to assist with answering questions or Hypothesis Test is a statistical technique to assist with answering questions or claims about population parameters by sampling from the population of interest. claims about population parameters by sampling from the population of interest.

P‐values give the probability that an observed association could have arisen solely P‐values give the probability that an observed association could have arisen solely as a result of chance bias in sampling, given the null hypothesis –it is dependent as a result of chance bias in sampling, given the null hypothesis –it is dependent on sample size and variability. on sample size and variability.

p‐value=0.01 means there is a 1‐in‐100 probability that what looks like an effect of p‐value=0.01 means there is a 1‐in‐100 probability that what looks like an effect of the treatment could have occurred randomly, and in truth there was no effect at the treatment could have occurred randomly, and in truth there was no effect at all. all.

19 20

5 Hypotheses, Effects and Significance Hypotheses, Effects and Significance Significance is significant, Separate no effect from non‐significance, Effect size matters Significance is significant, Separate no effect from non‐significance, Effect size matters

1. P‐values can indicate how incompatible the data are with a specified statistical model. 2. P‐values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone. 3. Scientific conclusions and business or policy decisions should not be based only on whether a p‐value passes a specific threshold. 4. Proper inference requires full reporting and transparency 5. A p‐value, or statistical significance, does not measure the size of an effect or the 95% Confidence Interval importance of a result. 6. By itself, a p‐value does not provide a good measure of evidence regarding a model or hypothesis.

21 22

Hypotheses, Effects and Significance Hypotheses, Effects and Significance Significance is significant, Separate no effect from non‐significance, Effect size matters Significance is significant, Separate no effect from non‐significance, Effect size matters

23 24

6 Measurement Error Measurement Error No measurement is exact No measurement is exact Measurement errors represent the limits of precision with which we can observe phenomena and are dictated by our instruments and operators.

Impacts: Bias, Power and Masking

25 26

Measurement Error Bias is Rife No measurement is exact

Sources: https://data36.com/statistical‐bias‐types‐explained/ https://www.statisticshowto.datasciencecentral.com/what‐is‐bias/ https://newonlinecourses.science.psu.edu/stat509/node/28/ 27 28 https://en.wikipedia.org/wiki/Bias_(statistics)

7 Bias is Rife Bias is Rife

29 30

Regression to the Mean Think Conditionally Regression to the mean can mislead Dependencies Change the Risks

31 32

8 Scientists are Human (as are you)

Part 3: Human Factors “Peer review is not infallible: journal editors might favour positive findings & newsworthiness.”

Statistical Tools Reporting Guidelines: CONSORT, TREND, STROBE, PRISMA, REMARK, STREGA

34 33

Base‐Rate Fallacy Correlation Does Not Imply Causation Beware the base‐rate fallacy

15% Green 85% Blue

80% Sure it’s green https://tylervigen.com/spurious‐correlations 35 36

9 Correlation Does Not Imply Causation Correlation Does Not Imply Causation

https://tylervigen.com/spurious‐correlations https://tylervigen.com/spurious‐correlations 37 38

Correlation Does Not Imply Causation Correlation Does Not Imply Causation

Bradford Hill criteria

39 40 https://spice‐spotlight.scot/2019/07/08/searching‐for‐causes‐in‐the‐blue‐water‐schools/

10 Cognitive Bias Cognitive Bias Feelings influence risk perception Feelings influence risk perception “Broadly, risk can be thought of as the likelihood of an event occurring in some time frame, multiplied by the consequences should the event occur. People’s risk perception is influenced disproportionately by many things, including the rarity of the event, how much control they believe they have, the adverseness of the outcomes, and whether the risk is voluntarily or not.”

https://www.visualcapitalist.com/18‐cognitive‐bias‐examples‐mental‐mistakes/ 41 42

Extreme Measurements may Mislead Data can be Dredged or Cherry Picked

That extreme measurements exist is a logical conclusion from random variation, regression to the mean, bias and measurement error.

IMDB & Wikipedia 43 44

11 Risky Extrapolation Risky Extrapolation Extrapolating beyond the data is risky Extrapolating beyond the data is risky

45 46

Risky Extrapolation Summary Extrapolating beyond the data is risky

Reviewed the 20 Tips from Sutherland’s paper from the standpoint of study design, statistical insights and human factors.

Introduced some related material.

Drew some links between the common themes that underly the 20 Tips.

47 48

12 Extra Slides Follow Bias is Rife Potential for Bias in RCT

49 50

Bias is Rife Questions? Potential for Bias in RCT Confounder Upcoming Research Skills Seminars: 1 Nov Grant Applications and Finding Funding Activity level Weight gain A/Prof Sue Skull (dep. Variable) (indep. variable) 15 Nov Qualitative Research Methods Dr Shirley McGough *Full 2019 seminar schedule in back of handouts

Age Please give us feedback! (confounder) A survey is included in the back of your handout or complete it online via: www.surveymonkey.com/r/stattips2019

51 ResearchEducationProgram.org | [email protected]

13 COMMENT

alls for the closer integration of science in political decision-making have been commonplace for decades. How- DAWID RYSKI DAWID Cever, there are serious problems in the appli- cation of science to policy — from energy to health and environment to education. One suggestion to improve matters is to encourage more scientists to get involved in politics. Although laudable, it is unrealistic to expect substantially increased political involvement from scientists. Another prop­ osal is to expand the role of chief scientific advisers1, increasing their number, availabil- ity and participation in political processes. Neither approach deals with the core prob- lem of scientific ignorance among many who vote in parliaments. Perhaps we could teach science to politi- cians? It is an attractive idea, but which busy politician has sufficient time? In practice, policy-makers almost never read scientific papers or books. The research relevant to the topic of the day — for example, mitochon- drial replacement, bovine tuberculosis or nuclear-waste disposal — is interpreted for them by advisers or external advocates. And there is rarely, if ever, a beautifully designed double-blind, randomized, replicated, con- trolled experiment with a large sample size and unambiguous conclusion that tackles the exact policy issue. In this context, we suggest that the imme- diate priority is to improve policy-makers’ understanding of the imperfect nature of science. The essential skills are to be able to intelligently interrogate experts and advisers, and to understand the quality, limitations and of evidence. We term these inter- pretive scientific skills. These skills are more accessible than those required to understand the fundamental science itself, and can form part of the broad skill set of most politicians. To this end, we suggest 20 concepts that should be part of the education of civil serv- ants, politicians, policy advisers and jour- nalists — and anyone else who may have to interact with science or scientists. Politicians with a healthy scepticism of scientific advo- cates might simply prefer to arm themselves with this critical set of knowledge. We are not so naive as to believe that improved policy decisions will automati- cally follow. We are fully aware that scien- tific judgement itself is value-laden, and Twenty tips for that bias and context are integral to how data are collected and interpreted. What we offer is a simple list of ideas that could help interpreting decision-makers to parse how evidence can contribute to a decision, and potentially to avoid undue influence by those with scientific claims vested interests. The harder part — the social acceptability of different policies — This list will help non-scientists to interrogate advisers remains in the hands of politicians and the broader political process. and to grasp the limitations of evidence, say William J. Of course, others will have slightly Sutherland, David Spiegelhalter and Mark A. Burgman. different lists. Our point is that a wider

21 NOVEMBER 2013 | VOL 503 | NATURE | 335 © 2013 Macmillan Publishers Limited. All rights reserved COMMENT KOHAUPT/FLICKR/GETTY; BEE: MICHAEL DURHAM/MINDEN/FLPA KOHAUPT/FLICKR/GETTY; BADGER: ANDY ROUSE/NATURE PICTURE LIBRARY; NUCLEAR PLANT: MICHAEL NUCLEAR PLANT: LIBRARY; PICTURE ROUSE/NATURE BADGER: ANDY

Science and policy have collided on contentious issues such as bee declines, nuclear power and the role of badgers in bovine tuberculosis. understanding of these 20 concepts by magnitude of problems or the effectiveness Regression to the mean can mislead. society would be a marked step forward. of solutions. An experiment might be biased Extreme patterns in data are likely to be, at by expectations: participants provided with least in part, anomalies attributable to chance Differences and chance cause variation. a treatment might assume that they will or error. The next count is likely to be less The real world varies unpredictably. Science experience a difference and so might behave extreme. For example, if speed cameras are is mostly about discovering what causes the differently or report an effect. Researchers placed where there has been a spate of acci- patterns we see. Why is it hotter this decade collecting the results can be influenced by dents, any reduction in the rate can- than last? Why are there more birds in some knowing who received treatment. The ideal not be attributed to the camera; a reduction areas than others? There are many explana- experiment is double-blind: neither the par- would probably have happened anyway. tions for such trends, so the main challenge of ticipants nor those collecting the data know research is teasing apart the importance of the who received what. This might be straight- Extrapolating beyond the data is risky. process of interest (for example, the effect of forward in drug trials, but it is impossible Patterns found within a given range do not climate change on bird populations) from the for many social studies. necessarily apply outside that range. Thus, innumerable other sources of variation (from arises when scientists find evidence for a it is very difficult to predict the response of widespread changes, such as agricultural favoured theory and then become insuffi- ecological systems to climate change, when intensification and spread of invasive species, ciently critical of their own results, or cease the rate of change is faster than has been expe- to local-scale processes, such as the chance searching for contrary evidence. rienced in the evolutionary history of existing events that determine births and deaths). species, and when the weather extremes may Bigger is usually better for sample size. be entirely new. No measurement is exact. Practically all The average taken from a large number of measurements have some error. If the meas- observations will usually be more informa- Beware the base-rate fallacy. The ability urement process were repeated, one might tive than the average taken from a smaller of an imperfect test to identify a condi- record a different result. In some cases, the number of observations. That is, as we accu- tion depends upon the likelihood of that measurement error might be large compared mulate evidence, our knowledge improves. condition occurring (the base rate). For with real differences. Thus, if you are told This is especially important when studies are example, a person might have a blood test that the economy grew by 0.13% last month, clouded by substantial amounts of natural that is ‘99% accurate’ for a rare disease and there is a moderate chance that it may actu- variation and measurement error. Thus, the test positive, yet they might be unlikely to ally have shrunk. Results should be pre- effectiveness of a drug treatment will vary have the disease. If 10,001 people have the sented with a precision that is appropriate naturally between subjects. Its average effi- test, of whom just one has the disease, that for the associated error, to avoid implying cacy can be more reliably and accurately esti- person will almost certainly have a positive an unjustified degree of accuracy. mated from a trial with tens of thousands of test, but so too will a further 100 people (1%) participants than from one with hundreds. even though they do not have the disease. Bias is rife. Experimental design or measur- This type of calculation is valuable when ing devices may produce atypical results in Correlation does not imply causation. It is considering any screening procedure, say for a given direction. For example, determin- tempting to assume that one pattern causes terrorists at airports. ing voting behaviour by asking people on another. However, the correlation might be the street, at home or through the Internet coincidental, or it might be a result of both Controls are important. A control group will sample different proportions of the patterns being caused by a third factor — is dealt with in exactly the same way as the population, and all may give different results. a ‘confounding’ or ‘lurking’ variable. For experimental group, except that the treat- Because studies that report ‘statistically example, ecologists at one time believed that ment is not applied. Without a control, it is significant’ results are more likely to be writ- poisonous algae were killing fish in estuar- difficult to determine whether a given treat- ten up and published, the scientific literature ies; it turned out that the algae grew where ment really had an effect. The control helps tends to give an exaggerated picture of the fish died. The algae did not cause the deaths2. researchers to be reasonably sure that there

336 | NATURE | VOL 503 | 21 NOVEMBER 2013 © 2013 Macmillan Publishers Limited. All rights reserved COMMENT are no confounding variables affecting the Separate no effect from non-significance. that groups of subprime mortgages had an results. Sometimes people in trials report The lack of a statistically significant result exceedingly low risk of defaulting together positive outcomes because of the context or (say a P-value > 0.05) does not mean that was a major element in the 2008 collapse of the person providing the treatment, or even there was no underlying effect: it means that the credit markets. the colour of a tablet3. This underlies the no effect was detected. A small study may importance of comparing outcomes with a not have the power to detect a real differ- Data can be dredged or cherry picked. control, such as a tablet without the active ence. For example, tests of cotton and potato Evidence can be arranged to support one ingredient (a placebo). crops that were genetically modified to pro- point of view. To interpret an apparent asso- duce a toxin to protect them from damaging ciation between consumption of yoghurt Randomization avoids bias. Experiments insects suggested that there were no adverse during pregnancy and subsequent asthma in should, wherever possible, allocate individ- effects on beneficial insects such as pollina- offspring9, one would need to know whether uals or groups to interventions randomly. tors. Yet none of the experiments had large the authors set out to test this sole hypoth- Comparing the educational achievement enough sample sizes to detect impacts on esis, or happened across this finding in a of children whose parents adopt a health beneficial species had there been any5. huge data set. By contrast, the evidence for programme with that of children of parents the Higgs boson specifically accounted for who do not is likely to suffer from bias (for Effect size matters. Small responses are less how hard researchers had to look for it — the example, better-educated families might be likely to be detected. A study with many rep- ‘look-elsewhere effect’. The question to ask is: more likely to join the programme). A well- licates might result in a statistically signifi- ‘What am I not being told?’ designed experiment would randomly select cant result but have a small effect size (and some parents to receive the programme so, perhaps, be unimportant). The impor- Extreme measurements may mislead. while others do not. tance of an effect size Any collation of measures (the effective- “The question is a biological, physi- ness of a given school, say) will show vari- Seek replication, not pseudoreplication. to ask is: ‘What cal or social question, ability owing to differences in innate ability Results consistent across many studies, am I not being and not a statistical (teacher competence), plus sampling (chil- replicated on independent populations, are told?’” one. In the 1990s, dren might by chance be an atypical sample more likely to be solid. The results of several the editor of the US with complications), plus bias (the school such experiments may be combined in a sys- journal Epidemiology asked authors to stop might be in an area where people are unu- tematic review or a meta-analysis to provide using statistical significance in submitted sually unhealthy), plus measurement error an overarching view of the topic with poten- manuscripts because authors were routinely (outcomes might be measured in different tially much greater statistical power than any misinterpreting the meaning of significance ways for different schools). However, the of the individual studies. Applying an inter- tests, resulting in ineffective or misguided resulting variation is typically interpreted vention to several individuals in a group, say recommendations for public-health policy6. only as differences in innate ability, ignoring to a class of children, might be misleading the other sources. This becomes problematic because the children will have many features Study relevance limits generalizations. with statements describing an extreme out- in common other than the intervention. The The relevance of a study depends on how come (‘the pass rate doubled’) or comparing researchers might make the mistake of ‘pseu- much the conditions under which it is done the magnitude of the extreme with the mean doreplication’ if they generalize from these resemble the conditions of the issue under (‘the pass rate in school x is three times the children to a wider population that does consideration. For example, there are limits national average’) or the range (‘there is an not share the same commonalities. Pseu- to the generalizations that one can make from x-fold difference between the highest- and doreplication leads to unwarranted faith in animal or laboratory experiments to humans. lowest-performing schools’). League tables, the results. Pseudoreplication of studies on in particular, are rarely reliable summaries of the abundance of cod in the Grand Banks in Feelings influence risk perception. Broadly, performance. ■ Newfoundland, Canada, for example, con- risk can be thought of as the likelihood of an tributed to the collapse of what was once the event occurring in some time frame, multi- William J. Sutherland is professor of largest cod fishery in the world4. plied by the consequences should the event conservation biology in the Department occur. People’s risk perception is influenced of Zoology, , UK. Scientists are human. Scientists have a disproportionately by many things, includ- David Spiegelhalter is at the Centre for vested interest in promoting their work, ing the rarity of the event, how much control Mathematical Sciences, University of often for status and further research funding, they believe they have, the adverseness of the Cambridge. Mark Burgman is at the Centre although sometimes for direct financial gain. outcomes, and whether the risk is voluntar- of Excellence for Biosecurity Risk Analysis, This can lead to selective reporting of results ily or not. For example, people in the United School of Botany, University of Melbourne, and occasionally, exaggeration. Peer review States underestimate the risks associated Parkville, Australia. is not infallible: journal editors might favour with having a handgun at home by 100-fold, e-mail: [email protected] positive findings and newsworthiness. Mul- and overestimate the risks of living close to 7 1. Doubleday, R. & Wilsdon, J. Nature 485, tiple, independent sources of evidence and a nuclear reactor by 10-fold . 301–302 (2012). replication are much more convincing. 2. Borsuk, M. E., Stow, C. A. & Reckhow, K. H. Dependencies change the risks. It is pos- J. Water Res. Plan. Manage. 129, 271–282 (2003). 3. Huskisson, E. C. Br. Med. J. 4, 196–200 (1974) Significance is significant. Expressed as P, sible to calculate the consequences of indi- 4. Millar, R. B. & Anderson, M. J. Fish. Res. 70, statistical significance is a measure of how vidual events, such as an extreme tide, heavy 397–407 (2004). likely a result is to occur by chance. Thus rainfall and key workers being absent. How- 5. Marvier, M. Ecol. Appl. 12, 1119–1124 (2002). 6. Fidler, F., Cumming, G., Burgman, M., Thomason, P = 0.01 means there is a 1-in-100 probability ever, if the events are interrelated, (for exam- N. J. Socio-Economics 33, 615–630 (2004). that what looks like an effect of the treatment ple a storm causes a high tide, or heavy rain 7. Fischhoff, B., Slovic, P. & Lichtenstein, S. Am. Stat. could have occurred randomly, and in truth prevents workers from accessing the site) 36, 240–255 (1982). there was no effect at all. Typically, scientists then the probability of their co-occurrence 8. Billinton, R. & Allan, R. N. Reliability Evaluation of 8 Power Systems (Plenum, 1984). report results as significant when the P-value is much higher than might be expected . 9. Maslova, E., Halldorsson, T. I., Strøm, M., Olsen, S. of the test is less than 0.05 (1 in 20). The assurance by credit-rating agencies F. J. Nutr. Sci. 1, e5 (2012).

21 NOVEMBER 2013 | VOL 503 | NATURE | 337 © 2013 Macmillan Publishers Limited. All rights reserved 3 ADDITIONAL RESOURCES – STATISTICAL TIPS FOR INTERPRETING SCIENTIFIC CLAIMS

3.1 Statistical Bias Statistical Bias Types explained https://data36.com/statistical-bias-types-explained/

What is Bias in ? https://www.statisticshowto.datasciencecentral.com/what-is-bias/

PennState Eberly College of Science, STAT 509: Clinical Biases https://newonlinecourses.science.psu.edu/stat509/node/28/ https://en.wikipedia.org/wiki/Bias_(statistics)

The Cochrane Collaboration’s tool of assessing risk of bias in randomised trials https://www.bmj.com/content/343/bmj.d5928 Revised Cochrane risk-of-bias tool for randomised trials (Rob 2) https://methods.cochrane.org/bias/resources/rob-2-revised-cochrane-risk-bias-tool-randomized-trials

3.2 Correlations Spurious Correlations https://tylervigen.com/spurious-correlations

Casual Diagrams: Draw your assumptions before your conclusions https://online-learning.harvard.edu/course/causal-diagrams-draw-your-assumptions-your-conclusions

https://spice-spotlight.scot/2019/07/08/searching-for-causes-in-the-blue-water-schools/

3.3 Cognitive Bias

18 Cognitive Bias examples show why mental mistakes get made https://www.visualcapitalist.com/18-cognitive-bias-examples-mental-mistakes/

3.4 References

• Sutherland, Spiegelhalter and Burgman (2013). Twenty tips for interpreting scientific claims. Nature 503 (335) • Kristine R. Broglio, Jason T. Connor & Scott M. Berry (2014) Not Too Big, Not Too Small: A Goldilocks Approach To Sample Size Selection, Journal of Biopharmaceutical Statistics, 24:3, 685- 705, DOI: 10.1080/10543406.2014.888569 • van der Bles Anne Marthe, van der Linden Sander, Freeman Alexandra L. J., Mitchell James, Galvao Ana B., Zaval Lisa and Spiegelhalter David J. Communicating uncertainty about facts, numbers and science 6R. Soc. open sci. http://doi.org/10.1098/rsos.181870 • Gigerenzer G. We need statistical thinking, not statistical rituals. Behavioral and Brain Sciences. 1998;21(2):199-200 • Codling, E., Plank, M., Benhamou, S., & Codling, E. (2008). Random walk models in biology. Journal of the Royal Society, Interface, 5(25), 813–834. https://doi.org/10.1098/rsif.2008.0014 • Dixon P. The p-value fallacy and how to avoid it. Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Experimentale. 2003;57(3):189-202 • Ronald L. Wasserstein & Nicole A. Lazar (2016) The ASA Statement on p-Values: Context, Process, and Purpose, The American Statistician, 70:2, 129-133, DOI: 10.1080/00031305.2016.1154108 • Nuzzo, R. (2014), “Scientific Method: Statistical Errors,” Nature, 506, 150–152. Available at http://www.nature.com/news/scientific-method-statistical-errors-1.14700. • Sackett, David L. “Bias in Analytic Research.” Journal of chronic diseases. 32.1-2 51–63. • Pannucci, C. J., & Wilkins, E. G. (2010). Identifying and avoiding bias in research. Plastic and reconstructive surgery, 126(2), 619–625. doi:10.1097/PRS.0b013e3181de24bc • https://www.understandinghealthresearch.org/ • https://training.cochrane.org/handbook/current • https://sites.google.com/site/riskofbiastool/welcome/rob-2-0-tool/current-version-of-rob-2 • Adrian G Barnett, Jolieke C van der Pols, Annette J Dobson, Regression to the mean: what it is and how to deal with it, International Journal of Epidemiology, Volume 34, Issue 1, February 2005, Pages 215–220, https://doi.org/10.1093/ije/dyh299 • Samuels, M. (1991). Statistical Reversion Toward the Mean: More Universal Than Regression Toward the Mean. The American Statistician, 45(4), 344-346. doi:10.2307/2684474 • Kahneman, D., & Tversky, A. (1972). Subjective probability: A judgment of representativeness. Cognitive Psychology, 3(3), 430–454. https://doi.org/10.1016/0010-0285(72)90016-3 • Bar-Hillel, M. (1980). The base-rate fallacy in probability judgments. Acta Psychologica, 44(3), 211– 233. https://doi.org/10.1016/0001-6918(80)90046-3 • Rothwell, P. (2005). External validity of randomised controlled trials: “To whom do the results of this trial apply?” The Lancet., 365(9453), 82–93. https://doi.org/10.1016/S0140-6736(04)17670-8 • Kidholm, K., Gerke, O., Vondeling, H., & Dyrvig, A. (2014). Checklists for external validity: a systematic review. Journal of Evaluation in Clinical Practice., 20(6), 857–864. https://doi.org/10.1111/jep.12166

Research Skills Seminar Series 2019 CAHS Research Education Program

ResearchEducationProgram.org [email protected]

Research Skills Seminar Series | CAHS Research Education Program Department of Child Health Research | Child and Adolescent Health Service