Arxiv:2101.00379V3 [Cs.CL] 8 Jun 2021

Investigating Memorization of Conspiracy Theories in Text Generation Note: This paper contains examples of potentially offensive conspiracy theory text. Sharon Levy, Michael Saxon, William Yang Wang University of California, Santa Barbara [email protected], [email protected], [email protected] Abstract pretrained generation models (Sheng et al., 2019; Groenwold et al., 2020; Solaiman et al., 2019). Of The adoption of natural language generation equally alarming concern are the memorization and (NLG) models can leave individuals vulnera- subsequent generation of factually incorrect data. ble to the generation of harmful information Conspiracy theories are one particular type of this memorized by the models, such as conspiracy theories. While previous studies examine con- data that can be especially damaging. spiracy theories in the context of social media, While it is not new for researchers to learn that they have not evaluated their presence in the a model may memorize data (Radhakrishnan et al., new space of generative language models. In 2019), we argue that the growing usage of machine this work, we investigate the capability of lan- learning models in society warrants targeted inves- guage models to generate conspiracy theory tigation to deter potential harms from problematic text. Specifically, we aim to answer: can we data. In this paper, we address the upsides and pit- test pretrained generative language models for the memorization and elicitation of conspiracy falls of memorization in generative language mod- theories without access to the model’s training els and its relationship with conspiracy theories. data? We highlight the difficulties of this task We further describe the difficulty of detecting this and discuss it in the context of memorization, memorization for the categories of memorization, generalization, and hallucination. Utilizing a generalization, and hallucination. Previous stud- new dataset consisting of conspiracy theory ies investigating memorization of text generation topics and machine-generated conspiracy the- models have done so with access to the model’s ories helps us discover that many conspiracy theories are deeply rooted in the pretrained lan- training data (Carlini et al., 2019, 2020). As mod- guage models. Our experiments demonstrate els are not always published with their training a relationship between model parameters such datasets, we set out to examine the difficult task of as size and temperature and their propensity to eliciting memorized conspiracy theories from a pre- generate conspiracy theory text. These results trained NLG model through various model settings indicate the need for a more thorough review without access to the model’s training data. of NLG applications before release and an in- We focus our study on the pre-trained GPT-2 lan- depth discussion of the drawbacks of memorization in generative language models. guage model (Radford et al., 2019). We investigate this model’s propensity to generate conspiratorial arXiv:2101.00379v3 [cs.CL] 8 Jun 2021 1 Introduction text, analyze relationships between model settings and conspiracy theory generation, and determine Recent advances in natural language processing how these settings affect the linguistic aspect of technologies have opened a new space for individ- generations. To do so, we create a new conspir- uals to digest information. One of these rapidly acy theory dataset consisting of conspiracy theory developing technologies is neural natural language topics and machine-generated conspiracy theories. generation. These models, made up of millions, Our contributions include: or even billions (Brown et al., 2020), of parameters, train on large-scale datasets. While attempts • We propose the topic of conspiracy theory are made to ensure that only “safe” data is uti- memorization in pretrained generative lan- lized for training these models, several studies have guage models and outline the harms and bene- shown the prevalence of biases produced by these fits of different types of generations in these models. Discussions of a link between vaccinations and autism have been circulating for years (Jolley and • We analyze pretrained language models for Douglas, 2014a; Kata, 2010). However, with the the inclusion of conspiracy theories without extreme interest throughout the world surrounding access to the model’s training data. the COVID-19 pandemic, new vaccination rumors are arising, such as the vaccine causing DNA al- • We evaluate the linguistic differences for gen- teration and claims of the pandemic acting as a erated conspiracy theories across different cover plan to implant trackable microchips3. The model settings. belief in these theories can prevent herd immunity • We create a new dataset consisting of con- through the lack of vaccinations45 . spiracy theory topics from Wikipedia and machine-generated conspiracy theory state- 2.2 NLG spreading conspiracy theories ments from GPT-2. As NLG models are being utilized for various tasks such as chatbots and recommendations sys- 2 Spread of Conspiracy Theories tems (Gatt and Krahmer, 2018), cases arise in 2.1 Dangers of conspiracy theories which these conspiracy theories and other biases can propagate unintentionally (Bender et al., 2021). A conspiracy theory is the belief, contrary to a We present one such scenario in which an NLG more probable explanation, that the true account model has memorized some conspiracy theories for an event or situation is concealed from the pub- and is being used for story generation (Fan et al., lic (Goertzel, 1994). A variety of conspiracy theo- 2018). An unaware individual may utilize this ap- ries ranging from the science-related moon landing plication and, given a prompt about the Holocaust, hoax (Bizony, 2009) to the racist and pernicious may receive a generated story discussing Holocaust Holocaust denialism1 are widely known throughout denial. The user, now having been exposed to a new the world. However, even as existing conspiracy conspiracy theory, may choose to ignore this gener- theories continue circulating, new conspiracy theo- ated text at this stage. However, a potential negative ries are consistently spreading. This is especially outcome is that the user may become interested in concerning given that half of Americans believe this story and search the statements online out of at least one conspiracy theory (Oliver and Wood, curiosity. This can lead the user down the “rabbit 2014). hole” of conspiracy theories online (O’Callaghan Widespread belief in conspiracy theories can et al., 2015) and alter their original assumptions be highly detrimental to society, driving prejudice towards believing this conspiracy theory. (Douglas et al., 2019), inciting violence2, and re- ducing science acceptance (van der Linden, 2015; 2.3 Why are conspiracy theories difficult to Lewandowsky et al., 2013). Science denial has real- detect? world consequences, such as resistance to measures for the reduction of carbon footprints (Douglas Recent years have seen the emergence of several and Sutton, 2015) and outbreaks of preventable ill- new tasks addressing fairness and safety within nat- nesses due to reduced vaccination rates (Goertzel, ural language processing in topics such as gender 2010). Further effects of conspiracy theory expo- bias and hate speech detection. Although detection sure can reach the political space and reduce citi- and mitigation of other biases and harmful content zens’ likelihood of voting in elections due to feel- have been thoroughly studied, that pertaining to ings of powerlessness towards the government (Jol- conspiracy theories is increasingly difficult due to ley and Douglas, 2014b). its inconsistent linguistic nature. At the time of writing, the COVID-19 pandemic Many existing tasks can utilize specific keyword is at its worst. Though COVID-19 vaccines have lists such as Hatebase6 for detection in addition to received approval and started distribution, new con- 3https://www.bbc.com/news/54893437 spiracy theories surrounding the COVID-19 vac- 4https://www.economist.com/graphic- cine may hinder society in its road to recovery. detail/2020/08/29/conspiracy-theories-about-covid-19- vaccines-may-prevent-herd-immunity 1http://auschwitz.org/en/history/holocaust-denial/ 5https://www.who.int/news-room/q-a-detail/herd- 2https://www.theguardian.com/us- immunity-lockdowns-and-covid-19 news/2019/aug/01/conspiracy-theories-fbi-qanon-extremism 6https://hatebase.org/ current techniques (Sun et al., 2019). However, con- for many years now. Related work has researched spiracy theory detection is an increasingly complex the types of information models memorize (Feld- problem and cannot be approached in the same way man and Zhang, 2020), how to increase generaliza- as the previous topics. Conspiracy theories have no tion (Chatterjee, 2018), and the ability to extract in- unified vocabulary or keyword list that can differen- formation from these models (Carlini et al., 2020). tiate them from standard text. Previous studies of While memorization is typically discussed in the conspiracy theories have exhibited their tendency space of memorization vs. generalization, we be- to lean towards issues of hierarchy and abuses of lieve this can be broken down even further. In the power (Klein et al., 2019). We argue this is not context of conspiracy theories, we establish three specific enough to define features for their detec- types of generations: tion. Often, specific keywords and tropes become typical of conspiracy theories regarding a specific

Arxiv:2101.00379V3 [Cs.CL] 8 Jun 2021

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support