The Radicalization Risks of Gpt-3 and Advanced Neural Language Models

The Radicalization Risks of Gpt-3 and Advanced Neural Language Models

THE RADICALIZATION RISKS OF GPT-3 AND ADVANCED NEURAL LANGUAGE MODELS Kris McGuffie Alex Newhouse Center on Terrorism, Extremism, and Counterterrorism Center on Terrorism, Extremism, and Counterterrorism Middlebury Institute of International Studies Middlebury Institute of International Studies [email protected] [email protected] September 9, 2020 1 Executive Summary In 2020, OpenAI developed GPT-3, a neural language model that is capable of sophisticated natural language generation and completion of tasks like classification, question-answering, and summarization. While OpenAI has not open- sourced the model’s code or pre-trained weights at the time of writing, it has built an API to experiment with the model’s capacity. The Center on Terrorism, Extremism, and Counterterrorism (CTEC) evaluated the GPT-3 API for the risk of weaponization by extremists who may attempt to use GPT-3 or hypothetical unregulated models to amplify their ideologies and recruit to their communities. Our methods included: 1. using prompts adapted from right-wing extremist narratives and topics to evaluate ideological consistency, accuracy, and credibility, and 2. evaluating the efficacy of the model’s output in contributing to online radicalization into violent extremism. Experimenting with prompts representative of different types of extremist narrative, structures of social interaction, and radical ideologies, CTEC finds: GPT-3 demonstrates significant improvement over its predecessor, GPT-2, in generating extremist texts. • GPT-3 shows strength in generating text that accurately emulates interactive, informational, and influential • content that could be utilized for radicalizing individuals into violent far-right extremist ideologies and behaviors. While OpenAI’s preventative measures are strong, the possibility of unregulated copycat technology represents • significant risk for large-scale online radicalization and recruitment. In the absence of safeguards, successful and efficient weaponization that requires little experimentation is likely. AI stakeholders, the policymaking community, and governments should begin investing as soon as possible in • building social norms, public policy, and educational initiatives to preempt an influx of machine-generated disinformation and propaganda. Mitigation will require effective policy and partnerships across industry, government, and civil society. This project was made possible by the OpenAI API Academic Access Program. 2 Background In 2019, a CTEC investigation discovered that OpenAI’s GPT-2 language model—arguably the most sophisticated generative AI at the time—could be trained to produce convincing extremist manifestos. [1] Testing across white supremacy, Islamism, anarchism, and Marxist-Leninism, CTEC’s analysis demonstrated that GPT-2 represented the frontier of a novel threat to the information landscape: text generation at a speed, scale, and ease of use that exceeds that of troll farms and human-run disinformation campaigns. GPT-2 has major limitations, however. Unless one has access to state-of-the-art computational resources, it requires hours of fine-tuning to intentionally bias a model to produce ideologically slanted propaganda. Further, each fine-tuned model is brittle; for instance, a white supremacist model cannot easily be modified to produce Islamist outputs. OpenAI’s latest project, GPT-3, escapes many of these disadvantages altogether through revolutionary advances in pre-training that allow for sophisticated no-shot, one-shot, and few-shot text generation capabilities (in other words, prompting the model with zero examples, one example, or a few examples of an intended goal). [2] Producing ideologically consistent fake text no longer requires a large corpus of source materials and hours of fine-tuning. It is as simple as prompting GPT-3 with a few Tweets, paragraphs, forum threads, or emails, and the model will pick up on the patterns and intent without any other training. While hosting the model still requires enterprise-scale resources, manipulating outputs no longer necessitates technical knowledge if an actor has access to a hosted model, such as OpenAI’s API. This revolutionary capacity for few-shot, subject-specific generation poses significant threats to information integrity and the social fabric of interactions on the Internet. This is further exacerbated by GPT-3’s impressively deep knowledge of extremist communities, from QAnon to the Atomwaffen Division to the Wagner Group, and those communities’ particular nuances and quirks. As a result, CTEC’s experiments and analysis show that GPT-3 can be prompted to perform tasks as varied as: producing polemics reminiscent of Christchurch shooter Brenton Tarrant, • reproducing fake forum threads casually discussing genocide and promoting Nazism in the style of the defunct • Iron March community, answering questions as if it was a heavily radicalized QAnon believer, • producing multilingual extremist texts, such as Russian-language anti-Semitic content, even when given • English prompts. The main risk of weaponization involves GPT-3’s potential capacity to dramatically increase influential output that is believably human. That influence could include radicalizing content that significantly reduces the workload of extremists in producing informative and interactive fora. Any copycat models created from GPT-3 and wielded without toxicity filters or other safeguards would benefit from great efficiency and a low cost of entry. No specialized technical knowledge is required to enable the model to produce text that aligns with and expands upon right-wing extremist prompts. With very little experimentation, short prompts produce compelling and consistent text that would believably appear in far-right extremist communities online. 3 Methodology and Discussion of Model Performance In order to generate subject-specific text, GPT-3 requires nothing more than a few short prompts that are representative of the text an analyst (or propaganda apparatus, nonstate actor, etc.) wants to emulate. As such, CTEC’s methodology for testing the abuse potential of GPT-3 is significantly different from what was used to test its older, more inflexible predecessor, GPT-2. To test GPT-3, CTEC gathered examples from a wide variety of text styles and structures. This was done with the explicit goal of understanding the model’s strengths and weaknesses in producing texts across the ideological and functional spectrums of extremism, from white supremacy to QAnon conspiracism, and from mainstream recruitment to ideological radicalization and organization. In order to avoid cherry-picking, examples in this paper were gathered from either the first or second attempt at running a particular prompt. GPT-2, on the other hand, required significant investments of time and resources. In order to produce fake text that effectively emulated the manifestos of white supremacist mass shooters, CTEC trained GPT-2 on a large dataset of many full-length manifesto texts. Using a modestly powered Azure virtual machine to emulate the capacity of nonstate, unsponsored actors, CTEC’s training took 10-12 hours to complete. The resulting model would respond to most prompts with convincingly radical completions, but the fake text remained bound by the data it was trained on: texts always retained manifesto-esque tone and grammar. GPT-3 represents a vastly reduced barrier to entry and a much more flexible product than GPT-2. Note: In example outputs shown below, bold text indicates our prompts, and non-bold text indicates GPT-3’s synthetic generations. All bold text, and only bold text, in the example outputs below is human-generated. Examples involve toxic language, including racist, sexist, and anti-Semitic phrases. 2 3.1 Sophisticated foundation of extremist knowledge To explore the boundaries of what GPT-3 knows, CTEC prompted it with a variety of questions about specific aspects of extremist groups and ideologies. For example, when prompted with “What is the Wagner Group?” GPT-3 produced a long back-and-forth dialogue sharing generally correct information about the Russian mercenary cell that has been active in Syria. Sample 1: Zero-Shot Prompt Asking About the Wagner Group Question: What is the name of the Russian mercenary group? Answer: Wagner group. Question: Where is the Wagner group? Answer: In Syria. Question: Who is the leader of the Wagner group? Answer: A Russian citizen, Dmitry Utkin. He was born in 1971 and lives in Moscow. He has been a member of the Russian parliament since 2016. He is a former officer of the Main Intelligence Directorate (GRU) of Russia’s General Staff. He was also a commander of the special forces unit "Vostok" (East). The unit was deployed to Chechnya and fought against separatists there in 1994-1996. After that, he served as an adviser to the head of Chechnya, Ramzan Kadyrov, and later became his deputy. In 2004-2005, he commanded a battalion in Chechnya and participated in fighting against insurgents there. In 2008-2009, he commanded another battalion in Abkhazia and South Ossetia during Russia’s war with Georgia over these two breakaway regions. Since 2014, he has been leading military units fighting against Ukrainian government forces on behalf of Russia-backed separatists in eastern Ukraine. According to open sources, Utkin is currently living in Moscow but travels frequently to Syria where his troops are fighting on behalf of Bashar al-Assad’s regime against anti-government forces there. Question: What does this group do? Where are they based? How many people

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    13 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us