Use Caution When Applying Behavioural Science to Policy Social and Behavioural Scientists Have Attempted to Speak to the COVID-19 Crisis

Home , Behavioural sciences

comment Use caution when applying behavioural science to policy Social and behavioural scientists have attempted to speak to the COVID-19 crisis. But is behavioural research on COVID-19 suitable for making policy decisions? We ofer a taxonomy that lets our science advance in ‘evidence readiness levels’ to be suitable for policy. We caution practitioners to take extreme care translating our fndings to applications. Hans IJzerman, Neil A. Lewis Jr., Andrew K. Przybylski, Netta Weinstein, Lisa DeBruine, Stuart J. Ritchie, Simine Vazire, Patrick S. Forscher, Richard D. Morey, James D. Ivory and Farid Anvari

esearchers in the social and estimated with precision, sometimes barely behavioural sciences periodically ruling out trivially small effects under Rdebate whether their research should ostensibly controlled conditions. Third, be used to address pressing issues in society. many studies use a narrow range of stimuli To provide a few examples, in the 1940s and do not test for stimulus generalisability4. psychologists discussed using research to Fourth, many studies examine effects on address problems related to intergroup measures, such as self-report scales, that relations, problems brought to the fore by are infrequently validated or linked to the Holocaust and other acts of rampant behaviour, much less to policy-relevant prejudice. In the 1990s, psychologists outcomes5. Fifth, independently replicated debated whether their research should findings, even under ideal circumstances, inform legal decision-making. In the 2010s, are rare. Finally, our studies often fail to psychologists argued for advising branches account for deeper cultural, historical, of government as economists often do. And political and structural factors that play now, in 2020, psychologists and other social important moderating roles during the and behavioural scientists are arguing that process of translation from basic findings our research should inform the response to application. Together, these issues to the new coronavirus disease (henceforth produce empirical insights that are more COVID-19)1,2. heterogeneous than might be apparent from We are a team mostly consisting a scan of the published literature. of empirical psychologists who Confident applications of social and Fig. 1 | NASA technology readiness levels. conduct research on basic, applied and behavioural science findings, then, require Original image source: https://www.nasa.gov/ meta-scientific processes. We believe that first and foremost an assessment of the directorates/heo/scan/engineering/technology/ scientists should apply their creativity, evidence quality and weighing heterogeneity txt_accordion1.html. efforts and talents to serve our society, and the trade-offs and opportunity costs that especially during crises. However, the way follow. We must identify reliable findings that social and behavioural science research that can be applied, have been investigated by the European Commission to judge how is often conducted makes it difficult to know in the nations for which the application is ready scientific applications beyond space whether our efforts will do more good than intended and are derived from investigations flight are for operational environments. harm. We will provide some examples from using diverse stimuli. But the assessment TRLs rank a technology’s readiness for the field of social-personality psychology, of how ‘ready’ the intervention is must be application from 1 to 9 (see Fig. 1). At TRL1, where most of us were trained, to illustrate included when persuading decision-makers basic principles have been reliably observed, our concerns. This focus is not meant to to apply social and behavioural science reported and translated to a formal model. imply that our field alone suffers from the evidence, particularly in crisis situations In TRL2, basic principles have been issues we will discuss. Instead, a growing when lives are at stake and resources are developed and tested in an application meta-science literature suggests that many limited. Not doing so can have disastrous area. It is not until TRL4, when a prototype other social and behavioural disciplines have consequences. is developed, that tests are run in various encountered dynamics similar to those faced Here we propose one approach for environments that are as representative of by our field. assessing the quality of evidence before the eventual application area(s) as possible. What are those dynamics? First, study application and dissemination. Specifically, Later, at TRL6, the system is tested in a ‘real’ participants, mainly students, are drawn we draw inspiration from the US National environment (like ground-to-space). At from populations that are in Western Aeronautics and Space Administration the very highest level (TRL9), the system (mostly US), educated, industrialized, rich (NASA)’s ‘technology readiness levels’ has been ‘flight-proven’ through successful and democratic (WEIRD) societies3. Second, (TRL6), a benchmarking system for mission operations. These TRLs provide a even with this narrow slice of population, systematically evaluating the quality of useful framework to jumpstart conversations the effects in published papers are not scientific evidence and which has been used about how to assess the readiness of social

1092 Nature Human Behaviour | VOL 4 | November 2020 | 1092–1094 | www.nature.com/nathumbehav comment

to select evidence that could potentially in nature across context and cultures. The be applied (ERL3). These systematic multidisciplinary and multi-stakeholder reviews require a number of bias-detection nature of ERLs requires us to fundamentally techniques. It is well-known that the rethink how we produce, and communicate behavioural sciences suffer from publication confidence in, application-ready findings. bias and other practices that compromise The current crisis provides a chance for the integrity of research evidence. Some social and behavioural scientists to question findings may be reliable, but the onus is on how we understand and communicate the us to identify which are and which are not value of our scientific models in terms of and which generalize or don’t. Yet, these ERLs. It also requires us to communicate systematic reviews must still be done with those ERLs to policy-makers so that they an awareness that the currently available know whether we are making educated statistical techniques do not completely guesses (ERL3 or below) or can be confident correct for bias and that the resultant about the application of our findings findings are at most at ERL3. because we have tested and replicated them Following this, one can gather in representative environments (ERL7). information about stimulus and When providing policy advice on the basis measurement validity and equivalence for of scientific evidence, it is important to application in the target setting (ERL4). understand and be able to explain whether Next, researchers—in consultation with and how recommendations would impact Fig. 2 | Proposed social and behavioural sciences local experts—should consider the potential affected individuals under a range of evidence readiness levels. benefits and harms associated with applying circumstances that are highly relevant to the potential solutions (ERL5) and generate crisis in question (ERL7). estimates of effects in a pilot sample (ERL6). Even if findings are at ERL3 after and behavioural science evidence for With preliminary effects in hand, the team assessing evidence quality of primary application and dissemination. can then begin to test for heterogeneity in studies, we have little way of knowing how low-stakes (ERL7) and higher-stakes (ERL8) much positive, or unintended negative, Introducing evidence readiness levels samples and settings, which would build the consequences an intervention might have The desire to “directly inform policy confidence necessary to apply the findings when applied to a new situation. We are and individual and collective behaviour in the real target setting or crisis situation concerned to see social and behavioural in response to the pandemic” (p. 461)1 (ERL9). scientists making confident claims about overlooks existing evidence frameworks and Even at ERL9, evidence evaluation the utility of scientific findings for solving the challenges we identify, illustrating that continues; applications of social and COVID-19 problems without regard for a simple taxonomy is necessary to have at behavioural work, particularly in a crisis, whether those findings are based on the hand during crises. As a very preliminary should be iterative, so high-quality evidence kind of scientific methods that would step to this end we propose a social and is fed back to evaluate the effectiveness of move them up the ERL ladder1. The behavioural science variant of TRLs, the intervention and to develop critical and absence of recognised benchmarking evidence readiness levels (ERLs; Fig. 2). flexible improvements. Feedback should be systems makes this challenging. While it There are several frameworks for grounded in collaboration between basic is tempting to instead qualify uncertainty assessing evidence quality across different and applied researchers, as well as with by using non-committal language about scientific fields. The one that comes stakeholders, to ensure that the resulting the possible utility of existing findings closest to what we envision is the Society evidence is relevant and actionable. Failure (for example, ‘may’, ‘could’), this approach for Prevention’s standards for prevention to continually re-evaluate interventions in is fundamentally flawed because public interventions7, as they incorporate standards light of new data could lead to unnecessary conversations generally ignore these for efficacy dissemination and feedback harm, where even the best evidence was rhetorical caveats8. Scientists should loops from crisis to theory. However, none inadequate to predict the intervention’s actively communicate uncertainty, of the existing frameworks capture the real-world effects. particularly when speaking to crises. meta-scientific insights generated in our A benchmarking system such as the Communicating that their ERL is only at field in the last decade. ERL requires us to think carefully about 3 or 4 would empower policy-makers by Our ERLs do not map perfectly onto the nature of our research that can be providing clear understanding of how to NASA’s TRLs, and we should not expect applied credibly and guides where research weight our advice in terms of their options. them to; there are many differences between investments should be made. For example, Reaching a higher ERL is extremely behavioural and rocket science. In the we can better recognise that our goal of complicated and will require radical social and behavioural sciences we think gathering reliable insights (ERL3) provides changes in the way we conduct research, this process should start with defining a necessary foundation for further collective not only in response to crises. problem(s) in collaboration with the efforts that scaffold towards scalable stakeholders most likely to implement the and generalizable interventions (ERL7). How social and behavioural scientists interventions (ERL1). These concepts can Engaging community experts, identifying can advance their ERLs then be further developed in consultation relevant theories, and collecting extensive The field of genetics started in a position with people in the target settings to gather observations are key to framing challenges similar to the position that many preliminary information about how settings and working with interdisciplinary teams behavioural sciences find themselves in or context might alter processes (ERL2). to address them (ERL1). Behavioural now, with small, independently collected From there, researchers can conduct scientists from different cultures then samples that produced unreliable findings. systematic reviews and other meta-syntheses discuss how interventions may need to differ Attempts to identify candidate genes for

Nature Human Behaviour | VOL 4 | November 2020 | 1092–1094 | www.nature.com/nathumbehav 1093 comment

many constructs of interest kept stalling at behavioural sciences in line with other Institute and Department of Experimental TRL1/ERL4. In one prominent example, mature sciences. Diverse consortia of Psychology, University of Oxford, Oxford, UK. 52 patients provided genetic material for researchers with expertise in philosophy, 5School of Psychology and Clincal Language an analysis of the relationship between ethics, statistics and data and code Sciences, University of Reading, Reading, UK. the 5-HTT gene and major depression9, management are needed to produce the kind 6Institute of Neuroscience and Psychology, University a finding that spurred enormous interest of research required to better understand of Glasgow, Glasgow, UK. 7Social, Genetic and in the biological mechanisms underlying people the world over. Realising this mature, Developmental Psychiatry Centre, King’s College depression. Unfortunately, as with the inclusive and efficient model necessitates London, London, UK. 8Department of Psychology, current situation in psychology, these a shift in the knowledge production and University of California, Davis, Davis, CA, USA. early results were contradicted by failed evaluation models that guide the social and 9Melbourne School of Psychological Sciences, replication studies10. behavioural sciences. University of Melbourne, Melbourne, Victoria, Technological advances in genotyping Australia. 10School of Psychology, Cardif University, unlocked different approaches for Be cautious when applying social and Cardif, UK. 11Department of Communication, geneticists. Instead of working in isolated behavioural science to policy Virginia Polytechnic Institute and State University, teams, geneticists pooled resources via On balance, we hold the view that the Blacksburg, VA, USA. 12Department of Marketing consortium studies and thereby accelerated social and behavioural sciences have the and Management, University of Southern Denmark, scientific progress and quality. Their recent potential to help us better understand Odense, Denmark. 13Tese authors contributed studies (with samples that sometimes our world. However, we are less sanguine equally: Hans IJzerman, Neil A. Lewis Jr., exceed 1,000,000) dwarf previous about whether many areas of social and Andrew K. Przybylski. candidate gene studies in terms of sample behavioural sciences are mature enough to ✉e-mail: [email protected] size11. To accomplish this, geneticists provide such understanding, particularly devoted considerable time to developing when considering life-and-death issues like Published online: 9 October 2020 research workflows, data harmonization a pandemic. We believe that, rather than https://doi.org/10.1038/s41562-020-00990-w systems and processes that increased the appealing to policy-makers to recognise accuracy of their measurements. The new our value, we should focus on earning References 1. Van Bavel, J. J. et al. Nat. Hum. Behav. 4, 460–471 (2020). methodologies are not without flaws: for the credibility that legitimates a seat at 2. Syed, M. Psychology of COVID-19 preprint tracker. https://docs. example, there is substantial scope for the policy table. The ERL taxonomy is a google.com/spreadsheets/d/1owb_jkXp6FMubhvh-aTKwzzxF87l expanding the representativeness of study sample roadmap for achieving this level of NiRjia6kKMw-ZGA/edit?usp=sharing (2020). 3. Henrich, J., Heine, S. J. & Norenzayan, A. Behav. Brain Sci. 33, cohorts. But the progress that consortium maturity as a science and for accurately and 61–83 (2010). research in genetics has made in a short time honestly communicating our current state 4. Judd, C. M., Westfall, J. & Kenny, D. A. J. Pers. Soc. Psychol. 103, is impressive. of evidence. Collaborations among large 54–69 (2012). 5. Webb, T. L. & Sheeran, P. Psychol. Bull. 132, 249 (2006). In recent years we have observed similar and diverse teams with local knowledge and 6. Mai, T., ed. Technology Readiness Level. NASA (2012). progress in the psychological sciences multidisciplinary expertise can help us move https://go.nasa.gov/2XKbFsq. going from single, small-sample studies up the evidence ladder. Equally important, 7. Gottfredson, D. C. et al. Prev. Sci. 16, 893–926 (2015). 12,13 8. Adams, R. C. et al. J. Exp. Psychol. Appl. 23, 1–14 (2017). to large-scale replications and novel studies in the behavioural sciences must 9. Heils, A. et al. J. Neurochem. 66, 2621–2624 (1996). 14 studies to the building of the prerequisite be designed to move up this ladder 10. Gillespie, N. A., Whitfeld, J. B., Williams, B. E. N., Heath, A. C. & infrastructure to facilitate team science. incrementally. Designing an ERL6 study Martin, N. G. Psychol. Med. 35, 101–111 (2005). 11. Jansen, P. R. et al. Nat. Genet. 51, 394–403 (2019). One example is the Psychological Science that is built on a shaky ERL1 foundation 12. Klein, R. A. et al. Soc. Psychol. 45, 142–152 (2014). Accelerator (PSA), a large standing network will be of little use. Moving up requires 13. Grahe, J. E., et al. Collaborative Replications and Education with experts facilitating study selection, data investment, thought and, most important of Project (CREP). https://osf.io/wfc6u (2016). management, ethics and translation15. While all, epistemic humility. Without a systematic 14. IJzerman, H. et al. Collabra: Psychology 4, 37 (2018). 15. Moshontz, H. et al. Adv. Methods Practices Psychol. Sci. 1, the PSA is making important progress, and iterative research framework, we believe 501–515 (2018). problems surrounding measurement that behavioural scientists should carefully validity, sample generalizability and consider whether well-intentioned advice Acknowledgements organizational diversity (40% of its may do more harm than good. ❐ The preparation of this work was partly funded by a French leadership is from North America), which National Research Agency “Investissements d’avenir” affect the network’s ability to accurately Hans IJzerman 1,2,13 ✉ , Neil A. Lewis Jr. 3,13, program grant (ANR-15-IDEX-02) awarded to H.I., a 4,13 5 Huo Family Foundation grant to A.K.P., an ERC 647910 interpret findings, still present material Andrew K. Przybylski , Netta Weinstein , (KINSHIP) grant awarded to L.D., and an ERC 851890 6 7 challenges to the applicability of their Lisa DeBruine , Stuart J. Ritchie , (SOAR) grant awarded to N.W. The funders had no role projects. Therefore, the PSA will require Simine Vazire 8,9, Patrick S. Forscher1, in study design, data collection and analysis, decision to substantial improvement and investment Richard D. Morey 10, James D. Ivory 11 publish or preparation of the manuscript. before it can generate practical ERL7-level and Farid Anvari 12 evidence and further develop our proposed 1LIP/PC2S, Université Grenoble Alpes, Grenoble, Competing interests 2 P.S.F., H.I. and N.A.L. are on the board of directors of the framework. France. Institut Universitaire de France, Paris, Psychological Science Accelerator network referenced 3 The COVID-19 crisis underscores France. Department of Communication, Cornell in the manuscript. The remaining authors declare no the critical need to bring the social and University, Ithaca, NY, USA. 4Oxford Internet competing interests.

1094 Nature Human Behaviour | VOL 4 | November 2020 | 1092–1094 | www.nature.com/nathumbehav