Truth, Lies, and Automation
Total Page:16
File Type:pdf, Size:1020Kb
Truth, Lies, and Automation x How Language Models Could Change Disinformation AUTHORS MAY 2021 Ben Buchanan Micah Musser Andrew Lohn Katerina Sedova Established in January 2019, the Center for Security and Emerging Technology (CSET) at Georgetown’s Walsh School of Foreign Service is a research organization fo- cused on studying the security impacts of emerging tech- nologies, supporting academic work in security and tech- nology studies, and delivering nonpartisan analysis to the policy community. CSET aims to prepare a generation of policymakers, analysts, and diplomats to address the chal- lenges and opportunities of emerging technologies. CSET focuses on the effects of progress in artifi cial intelligence, advanced computing, and biotechnology. CSET.GEORGETOWN.EDU | [email protected] 2 Center for Security and Emerging Technology MAY 2021 Truth, Lies, and Automation HOW LANGUAGE MODELS COULD CHANGE DISINFORMATION AUTHORS Ben Buchanan Andrew Lohn Micah Musser Katerina Sedova ACKNOWLEDGMENTS We would like to thank John Bansemer, Jack Clark, Teddy Collins, Aditi Joshi, Aurora Johnson, Miriam Matthews, Igor Mikolic-Torreira, Lisa Oguike, Girish Sastry, Thomas Rid, Chris Rohlf, Lynne Weil, and all the members of the CyberAI team at CSET for their comments on previous drafts and their help in preparing this study. In addition, we would in particular like to thank Catherine Aiken and James Dunham for their help in survey design and data collection, as well as Jennifer Melot for her assistance in data collection and code review. Lastly, we would also like to thank OpenAI for access to their model. All errors remain ours alone. DATA AND CODE Data and code supporting this project is available at https://github.com/georgetown-cset/GPT3-Disinformation AUTHORS Ben Buchanan is the director of the CyberAI Project and a senior faculty fellow with CSET, as well as an assistant teaching professor at Georgetown University’s Walsh School of Foreign Service. Drew Lohn is a senior fellow with the CyberAI Project, where Micah Musser is a research analyst and Katerina Sedova is a fellow. PRINT AND ELECTRONIC DISTRIBUTION RIGHTS © 2021 by the Center for Security and Emerging Technology. This work is licensed under a Creative Commons Attribution- NonCommercial 4.0 International License. To view a copy of this license, visit: https://creativecommons.org/licenses/by-nc/4.0/. DOCUMENT IDENTIFIER doi: 10.51593/2021CA003 Contents EXECUTIVE SUMMARY III INTRODUCTION VII 1 | HUMAN-MACHINE TEAMS FOR DISINFORMATION 1 2 | TESTING GPT-3 FOR DISINFORMATION 5 Narrative Reiteration 6 Narrative Elaboration 9 Narrative Manipulation 15 Narrative Seeding 21 Narrative Wedging 25 Narrative Persuasion 30 3 | OVERARCHING LESSONS 35 Working with GPT-3 35 Examining GPT-3’s Writing 37 4 | THE THREAT OF AUTOMATED DISINFORMATION 41 Threat Model 41 Mitigations 44 CONCLUSION 47 ENDNOTES 51 Center for Security and Emerging Technology i iv Center for Security and Emerging Technology Executive Summary or millennia, disinformation campaigns have been fundamentally human endeavors. Their perpetrators mix truth and lies in potent F combinations that aim to sow discord, create doubt, and provoke destructive action. The most famous disinformation campaign of the twenty-first century—the Russian effort to interfere in the U.S. presiden- tial election—relied on hundreds of people working together to widen preexisting fissures in American society. Since its inception, writing has also been a fundamentally human endeavor. No more. In 2020, the company OpenAI unveiled GPT-3, a powerful artificial intelligence system that generates text based on a prompt from human operators. The system, which uses a vast neural network, a powerful machine learning algorithm, and upwards of a trillion words of human writing for guidance, is remarkable. Among oth- er achievements, it has drafted an op-ed that was commissioned by The Guardian, written news stories that a majority of readers thought were written by humans, and devised new internet memes. In light of this breakthrough, we consider a simple but important question: can automation generate content for disinformation campaigns? If GPT-3 can write seemingly credible news stories, perhaps it can write compelling fake news stories; if it can draft op-eds, perhaps it can draft misleading tweets. To address this question, we first introduce the notion of a human- machine team, showing how GPT-3’s power derives in part from the human-crafted prompt to which it responds. We were granted free access to GPT-3—a system that is not publicly available for use—to study GPT-3’s capacity to produce disinformation as part of a human-machine team. We show that, while GPT-3 is often quite capable on its own, it reaches new Center for Security and Emerging Technology iii heights of capability when paired with an adept operator and editor. As a result, we conclude that although GPT-3 will not replace all humans in disinformation opera- tions, it is a tool that can help them to create moderate- to high-quality messages at a scale much greater than what has come before. In reaching this conclusion, we evaluated GPT-3’s performance on six tasks that are common in many modern disinformation campaigns. Table 1 describes those tasks and GPT-3’s performance on each. TABLE 1 Summary evaluations of GPT-3 performance on six disinformation-related tasks. TASK DESCRIPTION PERFORMANCE Narrative Generating varied short messages that GPT-3 excels with little human involvement. Reiteration advance a particular theme, such as climate change denial. Narrative Developing a medium-length story that fits GPT-3 performs well, and technical Elaboration within a desired worldview when given only fine-tuning leads to consistent performance. a short prompt, such as a headline. Narrative Rewriting news articles from a new GPT-3 performs reasonably well with little Manipulation perspective, shifting the tone, worldview, and human intervention or oversight, though our conclusion to match an intended theme. study was small. Narrative Devising new narratives that could form the GPT-3 easily mimics the writing style of QAnon Seeding basis of conspiracy theories, such as QAnon. and could likely do the same for other conspiracy theories; it is unclear how potential followers would respond. Narrative Targeting members of particular groups, A human-machine team is able to craft credible Wedging often based on demographic characteristics targeted messages in just minutes. GPT-3 deploys such as race and religion, with messages stereotypes and racist language in its writing for designed to prompt certain actions or to this task, a tendency of particular concern. amplify divisions. Narrative Changing the views of targets, in some cases A human-machine team is able to devise messages Persuasion by crafting messages tailored to their on two international issues—withdrawal from political ideology or affiliation. Afghanistan and sanctions on China—that prompt survey respondents to change their positions; for example, after seeing five short messages written by GPT-3 and selected by humans, the percentage of survey respondents opposed to sanctions on China doubled. iv Center for Security and Emerging Technology Should adversaries choose to pursue automation in their disinformation campaigns, we believe that deploying an algorithm like the one in GPT-3 is well within the capac- ity of foreign governments, especially tech-savvy ones such as China and Russia. It will be harder, but almost certainly possible, for these governments to harness the required computational power to train and run such a system, should they desire to do so. Across these and other assessments, GPT-3 proved itself to be both powerful and limited. When properly prompted, the machine is a versatile and effective writ- er that nonetheless is constrained by the data on which it was trained. Its writing is imperfect, but its drawbacks—such as a lack of focus in narrative and a tendency to adopt extreme views—are less significant when creating content for disinformation campaigns. Should adversaries choose to pursue automation in their disinformation cam- paigns, we believe that deploying an algorithm like the one in GPT-3 is well within the capacity of foreign governments, especially tech-savvy ones such as China and Russia. It will be harder, but almost certainly possible, for these governments to har- ness the required computational power to train and run such a system, should they desire to do so. Mitigating the dangers of automation in disinformation is challenging. Since GPT-3’s writing blends in so well with human writing, the best way to thwart ad- versary use of systems like GPT-3 in disinformation campaigns is to focus on the infrastructure used to propagate the campaign’s messages, such as fake accounts on social media, rather than on determining the authorship of the text itself. Such mitigations are worth considering because our study shows there is a real prospect of automated tools generating content for disinformation campaigns. In particular, our results are best viewed as a low-end estimate of what systems like GPT-3 can offer. Adversaries who are unconstrained by ethical concerns and buoyed with greater resources and technical capabilities will likely be able to use Center for Security and Emerging Technology v systems like GPT-3 more fully than we have, though it is hard to know whether they will choose to do so. In particular, with the right infrastructure, they will likely be able to harness the scalability that such automated systems offer, generating many messages and flooding the information landscape with the machine’s most danger- ous creations. Our study shows the plausibility—but not inevitability—of such a future, in which automated messages of division and deception cascade across the internet. While more developments are yet to come, one fact is already apparent: humans now have able help in mixing truth and lies in the service of disinformation. vi Center for Security and Emerging Technology Introduction Internet operators needed!” read a 2013 post on Russian social media.