" and We Will Fight for Our Race!" a Measurement Study of Genetic Testing Conversations on Reddit and 4Chan
Total Page:16
File Type:pdf, Size:1020Kb
“And We Will Fight For Our Race!” A Measurement Study of Genetic Testing Conversations on Reddit and 4chan∗ Alexandros Mittos1, Savvas Zannettou2, Jeremy Blackburn3, and Emiliano De Cristofaro1 1University College London 2Cyprus University of Technology 3Binghamton University fa.mittos,[email protected], [email protected], [email protected] Abstract include genetic ancestry tests, which promise a way to discover one’s ancestral roots, building on patterns of genetic variations Progress in genomics has enabled the emergence of a boom- common in people from similar backgrounds [57]. However, ing market for “direct-to-consumer” genetic testing. Nowadays, these are subject to limitations, e.g., results differ from provider companies like 23andMe and AncestryDNA provide affordable to provider due to different control groups [72]. AncestryDNA health, genealogy, and ancestry reports, and have already tested alone has tested more than 10M customers as of Jan 2019 [6]. tens of millions of customers. At the same time, alt- and far- Alas, increased popularity of self-administered genetic tests, right groups have also taken an interest in genetic testing, using and in particular ancestry, has also been accompanied by me- them to attack minorities and prove their genetic “purity.” In dia reports of far-right groups using it to attack minorities or this paper, we present a measurement study shedding light on prove their genetic “purity” [15, 66] mirroring concerns of a how genetic testing is being discussed on Web communities in new wave of scientific racism [68]. For example, white na- Reddit and 4chan. We collect 1.3M comments posted over 27 tionalists were recently taped chugging milk at gatherings to months on the two platforms, using a set of 280 keywords re- demonstrate the ability of white people to better digest lac- lated to genetic testing. We then use NLP and computer vision tose [43]. Also, statements from Donald Trump led Senator tools to identify trends, themes, and topics of discussion. Warren to publicly confirm her Native American ancestry via Our analysis shows that genetic testing attracts a lot of at- genetic testing, prompting heated debates on the matter [47]. tention on Reddit and 4chan, with discussions often includ- Interest in DTC genetic testing by right-wing communities ing highly toxic language expressed through hateful, racist, and comes at a time when racism, hate, and antisemitism on plat- misogynistic comments. In particular, on 4chan’s politically in- forms like 4chan, Gab, and certain communities on Reddit is on correct board (/pol/), content from genetic testing conversations the rise [44, 85]. Thus, these trends are particularly worrying, involves several alt-right personalities and openly antisemitic also considering how technology has been disrupting society in rhetoric, often conveyed through memes. Finally, we find that previously unconsidered ways [39]. The fact that racist, misog- discussions build around user groups, from technology enthu- ynistic, and dangerous behavior festers and spreads on the Web siasts to communities promoting fringe political views. at an unprecedented scale, eventually making its way into the real world, prompts the need for a thorough understanding of how genetic testing tools are being (mis)used in online discus- 1 Introduction sions. As genetics-based arguments for discrimination [8], and Over the past decade, researchers have made tremendous even genocide, have been made in the past, this should abso- arXiv:1901.09735v5 [cs.CY] 4 Oct 2019 progress toward understanding the human genome. With in- lutely not be overlooked. creasingly low costs, millions of people can afford to learn While other aspects of genetic testing have already been about their genetic make-up, not only in diagnostic settings, but studied, e.g., how they affect one’s perception of racial iden- also to satisfy their curiosity about traits, wellness, or discover tity [61, 71], we are interested in the relation between genetic their ancestry and genealogy. testing and online hate. This is a topic that has not been thor- In this context, a few companies have successfully launched oughly studied by the scientific community, despite, as dis- direct-to-consumer (DTC) genetic testing: individuals can pur- cussed earlier, increasingly worrisome indications of far-right chase a kit (typically around $100), mail it back with a saliva groups exploiting genetic testing for racist rhetoric. sample, and receive online reports after a few days. DTC com- With this motivation in mind, we identify and address the panies offer a wide range of services, from romantic match- following research questions: (1) What is the overall prevalence making [31] or identification of athletic skills [76] to reports of of genetic testing discourse on social networks like Reddit and health risks (e.g, likelihood of developing Parkinson’s), well- 4chan? (2) In what context do users discuss genetic testing? (3) ness (e.g., lactose intolerance), carrier status (e.g., hereditary Is genetic testing associated with far-right views, racist ideolo- hearing loss), traits (e.g., eyes color), etc. Popular products also gies, hate speech, and/or white supremacy? (4) If so, in what ∗This is the full version of the paper appearing in the 14th AAAI Conference context? Can we identify specific themes? on Web and Social Media (ICWSM 2020). Please cite the ICWSM version. We compile and use a set of 280 keywords related to genetic 1 testing to extract all available posts and comments from Reddit at the relationship between citizen science and white national- and 4chan. We collect 7K threads from the politically incorrect ists’ use of genetic testing, shedding light on how “repair strate- (/pol/) board of 4chan (consisting of 1.3M posts) from June gies” combine anti-scientific attacks on the legitimacy of these 30, 2016 to March 13, 2018, and 77K comments from Reddit tests and reinterpretations of them in terms of white nationalist related to genetic testing from January 1, 2016 to March 31, histories. Mittos et al. [51] conduct an exploratory study of the 2018, and analyze them along several axes to understand how Twitter discourse on genetic testing, examining 300K tweets re- genetic testing is being discussed online. lated to genetic testing, and find that those who are interested in We rely on natural language processing, computer vision, genetic testing appear to be tech-savvy and interested in digital and machine learning tools, including: (1) Latent Dirichlet Al- health in general. They also find sporadic instances of people location (LDA) [13] to identify topics of discussion; (2) word using genetic testing in a racist context, and others who express embeddings [49] to uncover words used in a similar context privacy concerns. across datasets; (3) Google’s Perspective API [63] to measure Chow et al. [21] examine 2K tweets containing the keyword toxicity in texts; and (4) Perceptual Hashing to assess the im- ‘23andMe’ spanning one week. They calculate their sentiment agery and memes shared in posts. and find out that the positive tweets outnumber the negative, Findings. Overall, the main findings of our study include: while users appear overall enthusiastic about the company’s services. Roth and Ivenmark [71] study how ancestry testing 1. Genetic testing is often discussed on /pol/ and on sub- affects ethnic and racial identities, conducting 100 interviews reddits associated with hateful, racist, and sexist content. with people who have white, black, Hispanic/Latino, Native These communities discuss genetic testing in a highly American, and Asian ancestry. They find instances of con- toxic manner, often suggesting its use to marginalize or sumers not accepting test results, and instead focus on estimates even eliminate minorities. based on social appraisals and aspirations. Overall, they sug- 2. The analysis of images posted on /pol/ highlights the gest genetic ancestry testing may reinforce race privilege. Clay- recurrent presence of popular alt-right personalities and ton et al. [24] conduct a meta-analysis of 53 studies involving “popular” antisemitic memes along with genetic testing 47K people around perceptions of genetic privacy, highlighting discussions. how survey questions are often phrased poorly, thus leading to possible misinterpretations of the results. They also show that 3. Word embeddings analysis reveals that certain subreddits not enough attention was paid to influential factors, e.g., partici- use ethnic terms in conjunction with genetic testing pants’ sociocultural backgrounds. Finally, Couldry and Yu [25] keywords in the same way as /pol/, which may be an discuss how DTC genetic companies, such as 23andMe, influ- indicator of 4chan’s fringe ideologies spilling out on more ence the public toward sharing their genetic data by claiming mainstream Web communities. that the abundance of data will improve people’s lives in the 4. Genetic testing on Reddit is being discussed in a variety long term, despite a body of work showing that genetic data of contexts, e.g., dog breeds, crime evidence, and issues cannot be securely anonymized [41, 73]. related to children (e.g., adoption, pregnancy). This Overall, most of the research in this area mostly relies on indicates how mainstream genetic testing has become. qualitative studies examining the societal effects of genetic test- 5. Reddit users are not uniformly interested in all aspects ing [18, 22, 26, 42, 56] and somewhat lacks quantitative large- of genetic testing, rather,