<<

bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

1 Quantifying and contextualizing the impact of bioRxiv preprints through automated

2 social media audience segmentation

3 Jedidiah Carlson1*, Kelley Harris1,2

4 1Department of Genome Sciences, University of Washington, Seattle, WA

5 2Computational Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA

6 *Corresponding author: [email protected] bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

7 Abstract

8 Engagement with scientific manuscripts is frequently facilitated by and other social

9 media platforms. As such, the demographics of a paper's social media audience provide a

10 wealth of information about how scholarly research is transmitted, consumed, and interpreted by

11 online communities. By paying attention to public perceptions of their publications, scientists can

12 learn whether their research is stimulating positive scholarly and public thought. They can also

13 become aware of potentially negative patterns of interest from groups that misinterpret their

14 work in harmful ways, either willfully or unintentionally, and devise strategies for altering their

15 messaging to mitigate these impacts. In this study, we collected 331,696 Twitter posts

16 referencing 1,800 highly tweeted bioRxiv preprints and leveraged topic modeling to infer the

17 characteristics of various communities engaging with each preprint on Twitter. We agnostically

18 leaned he chaaceiic of hee adience eco fom keod each e folloe

19 provide in their Twitter biographies. We estimate that 96% of the preprints analyzed are

20 dominated by academic audiences on Twitter, suggesting that social media attention does not

21 always correspond to greater public exposure. We further demonstrate how our audience

22 segmentation method can quantify the level of interest from non-specialist audience sectors

23 such as mental health advocates, dog lovers, video game developers, vegans, bitcoin investors,

24 conspiracy theorists, journalists, religious groups, and political constituencies. Surprisingly, we

25 also found that 10% of the highly tweeted preprints analyzed have sizable (>5%) audience

26 sectors that are associated with right-wing white nationalist communities. Although none of

27 these preprints intentionally espouse any right-wing extremist messages, cases exist where

28 extremist appropriation comprises more than 50% of the tweets referencing a given preprint.

29 These results present unique opportunities for improving and contextualizing research

30 evaluation as well as shedding light on the unavoidable challenges of scientific discourse

31 afforded by social media. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

32 Introduction

33 The primary barometer of success in academic research is the extent to which the research (or

34 individual researcher) has contributed to new discoveries and advancement of knowledge.

35 Quantifying, predicting, and evaluating these achievements plays a pivotal role in the academic

36 research ecosystem, and the metrics used to assess the impact of scientific publications can

37 hae a pofond inflence on eeache caee ajecoie. Meaing the scientific impact of

38 a publication typically involves metrics based on the number of citations received in the

39 literature. Examples of such metrics include the h-index, which is the maximum number h for

40 which an author has published h or more papers with at least h citations; and the Journal Impact

41 Factor, which is the average annual number of citations received by articles published in a given

42 journal [1]. These citation-based metrics (hereafter refeed o a adiional cienomeic)

43 are not without their flaws, as they can be gamed and exploited [2] and are susceptible to

44 systematic biases [3]. Nevertheless, traditional scientometrics provide a standardized and easy-

45 to-interpret system of evaluating the scientific impact made by an individual article or

46 researcher, or even entire institutions or academic journals.

47

48 Beyond the intrinsic goal of knowledge production, publicly-funded research is typically

49 expected to produce some form of societal impact in the form of social, cultural, environmental,

50 or economic returns for the taxpayer [4], and researchers are regularly expected to justify how

51 their line of study will generate such impacts [5]. For example, the importance of societal impact

52 is codified in National Science Foundation grant application guidelines, where funding

53 applican m dedicae a banial ecion of hei popoal o decibing he poenial [fo

54 their research] to benefit society and contribute to the achievement of specific, desired societal

55 ocome [6]. This distinction between scientific and societal impacts is relatively new: as noted

56 by Bornmann [4], for most of the 20th century, policymakers believed that any investment in bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

57 scientific research naturally led to positive societal impact, so the only relevant impact to be

58 evaluated was the degree to which a particular research output or individual researcher

59 advanced high-level academic knowledge.

60

61 However, a common point of concern is that existing paradigms for evaluating scientific impact

62 are often inappropriate for assessing the societal implications of scientific research [7]. One

63 reason for this discordance is simply the slow pace at which literature citations accumulate: it

64 may take years or even decades for an impactful article to accrue its due attention in the

65 scientific literature, which could lead to the short-term conclusion that the research was neither

66 scientifically nor societally impactful. Moreover, literature citations only directly measure the

67 attention received from other researchers and tell us very little about how a given article of

68 research has penetrated the public mindset. An important goal in research evaluation, therefore,

69 is to identify data sources that qualitatively and quantitatively trace engagement with research

70 from non-specialist audiences.

71

72 Altmetrics: a new way to measure societal impact?

73 In the last decade, scientists have flocked to Twitter and other online social media platforms to

74 share their research, connect with colleagues, and engage with the public [8]. The enthusiastic

75 adoption of social media by some researchers has dramatically altered how scientific

76 publications are diffused throughout the scientific community and bridged into the public forum,

77 leading some in the research community to consider social media as a vital tool for public

78 engagement and scientific communication [9]. Unlike traditional metrics, data gleaned from

79 ocial media (and meic heeof, commonl knon a almeic) can poide a nearly

80 inananeo eado of a pape epoe aco boad co-sections of experts and lay

81 adience alike. Thogh ome hae peclaed ha ocial media b abo eeach ha he

82 potential to be highly superficial and not necessarily indicative of lasting societal or scientific bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

83 impact (or even if the paper is downloaded and read) [1012], other evidence suggests that a

84 flurry of interest in new research on social media might actually provide an immediate indicator

85 of its potential downstream scientific and societal impacts, including: 1) predicting the number of

86 ime one eeach i cied in he cienific lieae [13]; 2) facilitating engagement with news

87 media and public audiences [14,15]; 3) fostering networking, collaboration, and career

88 development opportunities [9] and 4) predicting the likelihood of receiving grant funding [16].

89

90 In 2010, he Almeic Manifeo olined a iion fo ho ocial media daa and ohe

91 altmetrics could expand our ability to systematically measure and assess the societal impact of

92 research [17]. In the following years, a cottage industry of organizations, such as Altmetrics,

93 ImpacSo, and Eleie Plm Analytics, have brought the concept of altmetrics to the

94 mainstream. For millions of journal articles, these companies now index every instance in which

95 an article is referenced on social media sites like Twitter, Facebook and Reddit, along with

96 mentions in news articles, Wikipedia pages, and policy documents. These data are typically

97 aggregated into a single article-level score, largely weighted by how frequently the article is

98 referenced on social media. Preprints that garner high altmetric scores are celebrated by some

99 authors, climb to the top of article recommendation engines, and provide prospective readers

100 with a glimpse of which new research appears to be generating attention within days or even

101 hours of its initial public release.

102

103 Despite the general enthusiasm about altmetrics, metaresearch experts have stressed that

104 singular quantitative indicators like the Altmetric Attention Score fail to provide the appropriate

105 context necessary for evaluating societal impact [7,18]. Altmetric readily acknowledges that their

106 Aenion Scoe alone i infficien fo ealaing a pape cope of impac: In Almeic

107 introduction to the list of papers that received the top 100 Attention Scores of 2019, they caution

108 ha he onl heme ha man of hee pape hae in common i hei abili o a bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

109 coneaion...he anking ha no beaing on he ali o impac of he eeach ielf. [19].

110 Idenifing hee coneaion ae pape i ceainl a alable indicao of eeach ha

111 might lead to tangible impacts on the broader public, but if altmetrics are to be harnessed to

112 meaningfully measure societal impact, we must move beyond the simple question of how much

113 attention an article receives on social media, and prioritize questions of context: who is

114 engaging with the article, what they are saying about it, and when and where they are doing so

115 [7,20].

116

117 Contextualizing altmetrics

118 Among the major altmetrics data aggregators (Altmetric, Plum Analytics, Impact Story), only

119 Altmetric attempts to provide cursory classifications of Twitter users who have engaged with an

120 article. Altmetic e a popiea algoihm ha ealae keod in pofile decipion,

121 he pe of jonal ha e link o, and folloe li fom each e o egmen he oeall

122 audience into four basic categories: three describe different groups of academic/scientific

123 akeholde (cieni, paciione, cience commnicao), and a foh, membe of

124 he pblic, encompae all ohe ho do no fi ino one of he ohe caegoie [21]. In a

125 post written for Altmetric [22], Haustein, Toupin, and Alperin explain that classifying users even

126 into these coarsely-defined groups is a challenging task due to the inherent sparsity and

127 noisiness of direct self-descriptions provided by Twitter users in their biographies, which are

128 limied o j 160 chaace. Moeoe, Almeic adience claificaion do no appea o

129 factor into how Altmetric Attention Scores are quantified, nor are these data directly visible on

130 pblihe ebie alongide he Almeic Aenion Scoe fo each aicle. Depie

131 Almeic epeaed emphai ha demogaphic cone i impoan, hee deail appea o be

132 downplayed in favor of a singular quantitative score.

133 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

134 Fortunately, we are not restricted to learning about audience characteristics from individual

135 Twitter biographies, which are plagued by sparsity and noise. A cornerstone of sociological

136 research is the principle of network homophily, which states that individuals tend to connect and

137 engage within a social network of others who share similar characteristics [23]. Online social

138 media is no exception: patterns of network homophily have been demonstrated in numerous

139 studies of Twitter users [2428], and there is evidence to suggest that the homophily of an

140 indiidal online connecion geneall mio hei face-to-face networks [29]. Recent studies

141 have applied this principle to the study of altmetrics and demonstrated that deeper

142 conealiaion of a pblicaion adience on ocial media can be achieed b eamining

143 various aspects of how individual users are networked with others on Twitter [7,30]. More

144 specifically, network homophily enables the identification of various characteristics of an

145 individual Twitter user based on the self-descriptions of the accounts connected with that

146 individual on Twitter [25,31].

147

148 In hi d, e peen a fameok fo claifing membe of a pape adience on Tie

149 into granular, informative categories inferred through probabilistic topic modeling of metadata

150 colleced fom each e neok of folloe. Pobabiliic opic modeling is a powerful and

151 flexible method for revealing the granular composition of all sorts of complex datasetsnot only

152 the present application of text mining [36], but also inference of population ancestry using

153 genetic markers [32], computer vision [33], and many other areas of study. With this approach,

154 we analyze the Twitter audiences for 1,800 highly tweeted preprints (encompassing over

155 330,000 ee in oal) aco a aie of opic in biolog. We ho ha each aicle

156 audience on social media is characterized by a unique composition of online communities, both

157 within academia and among diverse lay audiences. The audience classifications inferred with

158 our topic modeling framework thus provide valuable context for interpreting altmetrics and

159 providing initial traces of potential societal impacts. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

160

161 We highlight three ways in which these inferred audience demographics can serve to enhance

162 interpretation of altmetrics: 1) more accurate quantification and classification of the various

163 academic audience sectors that express interest in a particular research output, 2) detailed

164 characterization of lay audiences that inform potential societal impacts, and 3) exploring how

165 politically-engaged audiences are interacting with research. Because detailed context is such an

166 important aspect of interpreting altmetrics, we have compiled an online portal showcasing

167 summary data across all preprints analyzed along with interactive visualizations detailing the

168 various dimensions of audience characteristics for individual preprints, available at

169 http://carjed.github.io/audiences.

170

171 Our analyses also reveal a subset of preprints that attract a great deal of attention from

172 audience sectors associated with far-right ideologies, including white nationalism. These

173 communities appear to be especially active in their engagement with preprints concerning the

174 genetic architecture of behavioral traits, human population genetics and ancient DNA research,

175 and the neurological and physiological variation across sexes and genders, among a plethora of

176 other topics. In some cases, these politically-motivated audience sectors comprise over half of

177 the total audience of users referencing an article, providing concrete evidence to support

178 concerns about racist and sexist misappropriation of research that have been expressed by

179 academic organizations [34], news media [35], and scientists themselves [36]. We discuss how

180 stakeholders in the scientific community can use these audience demographics to understand

181 the social and political implications of their work and assert their expertise to guide public

182 science literacy in the era of social media.

183 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

184 Results

185 Data collection

186 We used Rxivist, a service that indexes article-level metrics for preprints posted to the bioRxiv

187 preprint repository [37], to identify 1,800 of the most highly tweeted bioRxiv preprints posted

188 between November, 2013 and February, 2020, spanning the 27 categories under which bioRxiv

189 preprints are organized. The number of tweets we analyzed per preprint ranged from a

190 minimum of 50 to a maximum of 3,794 tweets. We focused on bioRxiv preprints rather than

191 peer-reviewed journals for the following reasons: 1) bioRxiv covers a broad cross-section of

192 topics in the biological sciences, 2) metadata for bioRxiv preprints is readily retrievable through

193 the open-source Rxivist platform, 3) bioRxiv preprints are fully open-access and freely available

194 to read by anyone, and 4) unlike many publishers of peer-reviewed research, bioRxiv does not

195 highlight or promote certain preprints over others, nor are preprints typically promoted to the

196 news media via press releases, creating a more even playing field for preprints to organically

197 garner attention on social media.

198 Topic modeling for audience sector classification

199 Given a preprint and the list of Twitter users that tweeted or retweeted a link to it, we are

200 primarily interested in classifying each of these users into a paicla adience eco, defined

201 here as a subset of users whose followers share similar self-identified characteristics (e.g.,

202 occupation, personal interests, hobbies). Under the principle of network homophily, we assume

203 that the characteristics of the communities each user is affiliated with can be inferred from

204 aggegaed infomaion abo ha e folloe, namel he elf-decipion ha e

205 followers provide in their Twitter biographies. For each preprint, we collected information about bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

206 each user that tweeted or retweeted a link to that preprint, then queried the Twitter API to collect

207 meadaa fo each of hee e folloe (Methods). For each user, we then compiled their

208 folloe biogaphie and ceen name ino a ingle ecor of words (including emoji and

209 hahag), hich e efe o a a folloe docmen. O been anale of hee

210 folloe docmen ed a bag of od appoach, ha i, he ode in hich od occed

211 in each document (and the specific follower they originated from) is ignored. We refer to the

212 collecion of folloe docmen fo a gien pepin a he adience cop.

213

214 For each preprint, we extracted information about the underlying audience sectorsor

215 opicby applying Latent Dirichlet Allocation (LDA) to the audience corpus of associated

216 follower documents (Methods). The LDA model assumes that each audience corpus consists of

217 K distinct underlying audience sectors, such that each follower document can be represented

218 probabilistically as a mixture of these K audience sectors, with each audience sector

219 represented by a collection of words that frequently co-occur within the same latent topic. We

220 expect that each of these representations conveys semantic meaning about the characteristics

221 of a gien adience eco, e.g., a opic epeened b he keod pofeo, niei,

222 geneic, poplaion, and eolion likel eflec an adience eco of

223 population/evolutionary geneticists.

224

225 Inferring the diversity of academic audiences on social media

226 Examples of the LDA topic modeling results for four selected preprints are shown in Fig. 1,

227 where each user that referenced a given preprint is displayed as a stack of bars segmented into

228 K colors representing the estimated membership fraction in each audience sector, analogous to

229 the de facto standard of visualizing inferred genetic ancestry of individuals with the popular bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

230 STRUCTURE program, which uses the same underlying Latent Dirichlet Allocation model

231 [38,39]. We selected these preprints because they represent a variety of scientific disciplines

232 covered on bioRxiv (animal behavior, synthetic biology, genomics, and neuroscience) and their

233 audience sectors exhibit a wide range of academic and non-academic communities. For each

234 preprint, we classified each of the inferred audience eco a academic o non-academic,

235 flagging topics whose top keywords include a combination of words indicating an academic

236 affiliaion, ch a PhD, niei o pofeo a academic adience eco and

237 classifying all other topics a non-academic adience eco. Fo he academic adience

238 sectors, we took the additional step of attempting to map the words associated with each topic

239 to a singular scientific discipline (Methods).

240

241 As shown in Fig. 1, the academic audience sectors frequently align with the bioRxiv category

242 under which the preprint was classified, confirming that our method captures relevant

243 infomaion abo he chaaceiic of indiidal engaging ih ha pepin. DeepPoeKi, a

244 software toolkit for fast and ob animal poe eimaion ing deep leaning, b Gaing e

245 al., a bmied o bioRi in he animal behaio and cogniion caego [40]. Academic

246 audience sectors mapped for this preprint include ecology (topic 1), artificial intelligence (topics

247 3 and 10), neuroscience (topic 5), psychology (topics 7 and 8) and ethology (topic 12) (Fig. 1a).

248 Engineeing Bain Paaie fo Inacellla Delie of Theapeic Poein, b Bacha e al.,

249 a bmied o bioRi in he nheic biolog caego [41]. Academic audience sectors

250 mapped for this preprint include molecular engineering (topic 1), bioinformatics (topic 4),

251 neuroscience (topic 5), evolutionary sociology (topic 6), biostatistics (topic 7), public health

252 (topic 11), and medical psychology (topic 12) (Fig. 1b). Se diffeence in gene epeion in

253 he hman feal bain, b OBien e al., a bmied o bioRi in he genomic caego

254 [42]. Academic audience sectors mapped for this preprint include bioinformatics (topic 1) and

255 political philosophy (topic 9) (Fig. 1c). Ho face pecepion nfold oe ime, b Dobs et al., bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

256 a bmied o bioRi in he neocience caego [43]. This preprint was found to have

257 only a single academic audience sector, psychology (topic 5) (Fig. 1d). bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

258 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

259 Fig. 1. Topic modeling output for 4 selected preprints. Each panel shows the LDA topic

260 modeling results for one of four selected preprints: A. DeePeKi, a fae lki f fa

261 ad b aimal e eimai ig dee leaig, b Gaig e al. B. Egieeig Bain

262 Paaie f Iacellla Delie f Theaeic Pei, b Bacha e al. C. Se diffeece

263 i gee eei i he hma feal bai, b OBie e al. D. H face ecei fld

264 e ime, b Db e al. I each ael, ee acc that tweeted about that paper is

265 eeeed b a hial ack f 12 cled egme ha eee he acc

266 estimated membership fractions in each of the 12 inferred audience sectors. The top 10

267 keywords, hashtags, or emoji associated with each topic are shown in the corresponding

268 colored boxes next to each panel, with the top 3 keywords for each topic highlighted in bold.

269 Topics inferred to correspond to academic audience sectors are displayed in shaded boxes.

270

271 One way we can attempt to coarsely quantify the broader societal impact of a preprint using

272 these data is to compare the relative engagement from academic versus non-academic

273 audiences on social media [44], similar to how Altmetric provides an estimated demographic

274 breakdown of users tweeting about a paper [21]. For each preprint, we estimated the fraction of

275 the audience falling into any of the inferred academic audience sectors and non-academic

276 audience sectors (Methods). We found that the academic audience sectors typically comprised

277 he majoi of a pepin adienceacross the 1,800 preprints analyzed, we estimated 95.9%

278 had a majority audience consisting of academic audience sectors (Fig. 2).

279

280 We next collected the demographic classifications for each preprint as estimated by Altmetric

281 and compared these to our estimates to determine whether our estimates were consistent. The

282 fraction of academic users for each preprint was well-correlated with the fraction of users

283 infeed o be cieni, cience commnicao, o paciione b Almeic (r =0.72),

284 though our method estimated a higher proportion of academic-affiliated users for all but two of bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

285 the 1,800 preprints (Fig. 2). In man cae, a pepin majoi adience (eihe academic o

286 non-academic) as estimated by Altmetric was reversed when estimated using our method: 833

287 (46.3%) of the preprints analyzed had majority non-academic audiences according to Altmetric,

288 but majority academic audiences according to our method. These conflicting estimates are

289 difficult to reconcile, as the only way to establish the ground truth would be to manually classify

290 each of the 331,696 users in our dataset. Though performing this manual validation is not

291 feaible a cale, e did o fo one pepin, Shake-it-off: A simple ultrasonic cryo-EM specimen

292 prepaaion deice, [45] as a proof-of-concept to demonstrate that our method is likely a more

293 accurate classifier of academic-affiliaed accon han Almeic mehod. O aionale fo

294 selecting this preprint was that it received relatively few tweets that needed to be manually

295 annotated but had high discordance between our estimated academic audience fraction (100%)

296 and that estimated by Altmetric (39.8%). In addition, the title of this preprint contains a pithy

297 efeence o he ile of he ong Shake i Off b pop inge Talo Sifthis was not a

298 coincidence, as the most retweeted tweet about the preprint was posted by the first author, John

299 Rubenstein, who described their newly-devised cryo-EM device using a parody of a line from

300 he cho of Sif Shake if Off:

301

302 Sae' ga a a a

303 Shake-it-off: A simple ultrasonic cryo-EM specimen preparation device.

304 (https://twitter.com/RubinsteinJohn/status/1126553381374439424)

305

306 This is relevant because papers with "amusing, quirky or off-beat" research or those that have

307 buzzwords in the title are often presumed to attract greater attention from the public [10], so we

308 reasoned that this preprint in particular would align with the null hypothesis that Altmetric's

309 estimate of a majority non-academic audience was correct for this preprint.

310 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

311 At the time we collected data for this preprint, Altmetric had indexed 88 unique users that had

312 tweeted about this paper and estimated that 35 of these users were either scientists (33),

313 practitioners (1), or science communicators (1), and the remaining 53 were members of the

314 public. Our method only analyzed 73 of these accounts (15 were private accounts or had fewer

315 than 5 followers and were excluded), but among these, all 73 were classified into academic

316 audience sectors. We manually classified each of these 73 accounts as either academic or non-

317 academic by examining self-dicloed infomaion in each e bio. If a bio a emp o

318 ambigo, e pefomed a Google each fo he e ceen name and/or username to

319 determine if they had an online presence associated with an academic institution. We

320 determined 68 accounts belonged to individual academics (graduate students, postdocs, staff

321 scientists, or professors), academic labs, university departments, or university-affiliated

322 academic organizations. Further, 3 accounts (two automated bots and one human-curated

323 account) were dedicated to posting tweets about the latest papers in the field of

324 biophysics/microscopy and were primarily followed by academics. One other account belonged

325 to a company that manufactures and sells microscopes and ostensibly has a customer base

326 (and thus, Twitter following) consisting of primarily scientists. This left only a single account that

327 we could not unambiguously classify as an academic account, though deeper examination

328 fond ha man of hi accon folloe ok ihin he field of biophic and micocop,

329 suggesting they also possess an affiliation with this academic network. We conclude that

330 Altmetric misclassified over 50% of the academic (or academic-affiliaed) accon a membe

331 of he pblic, heea o mehod, a o, onl miclaified a ingle e. Thi alidaion

332 example demonstrates that our method of inferring audience demographics is potentially much

333 more accurate at classifying academic-affiliated accounts than the Altmetric method.

334

335 We alo noe ha Almeic eimae of academic adience compied a majoi fo onl 13

336 (27.1%) of he 48 pepin analed in he cienific commnicaion and edcaion caego. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

337 Nearly all preprints we analyzed in this category revolve around topics that we interpret as being

338 primarily relevant to academics, including several surveys of graduate students, postdoctoral

339 fellows, and tenure-track faculty, investigations of biases in publication, peer review, and grant-

340 funding practices, and scientometrics and altmetrics. In contrast, our method consistently

341 estimated that academic sectors comprised a majority of the audiences for these preprints, with

342 a median academic audience fraction of 92.6% and a minimum of 61%.

343

344

345

346 Fig. 2. Comparison of academic audience fractions estimated by Altmetric versus our

347 estimates. Each point represents an individual preprint, with colors corresponding to the

348 bioRxiv category. The size of each point indicates the total number of tweets referencing that

349 preprint. An interactive version of this figure can be accessed at

350 https://carjed.github.io/audiences. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

351

352 Inferring the characteristics of non-academic audience sectors

353 A commonly cited advantage of altmetrics over traditional scientometrics is that altmetrics can

354 capture information about engagement from lay audiences, which could be useful for evaluating

355 dimensions of research productivity and scholarship that are often overlooked. For example,

356 such information may guide researchers in describing and quantifying the broader impacts of

357 their work in grant applications, enable new citizen science collaborations (such as fold-it, a

358 crowdsourced computer game where players predict protein structures [46]) or crowdfunded

359 research (such as the ~1,000 research projects that have been successfully funded on the

360 scientific crowdfunding platform, experiment.com), and provide concrete evidence of

361 eeache indiidal cience commnicaion and pblic engagemen effo.

362

363 Overall, we observed a weak positive correlation (r =0.258) between the non-academic

364 audience fraction and the total number of tweets, suggesting that preprints which gain more

365 attention on Twitter do tend to have a broader reach with non-academic audiences. Below, we

366 examined the non-academic audience sectors inferred by our topic modeling analysis for the

367 four preprints used as case studies in Fig. 1 and investigated how this information could be

368 used in contextualized evaluation of altmetrics.

369

370 As with the inferred academic audience sectors, the characteristics of non-academic audience

371 sectors often aligned iniiel ih he opic of he pepin. DeepPoeKi, a ofae oolki fo

372 fa and ob animal poe eimaion ing deep leaning, b Gaing e al. [40] included non-

373 academic audience sectors associated with video game developers (topic 2), business

374 applications of artificial intelligence (topics 6 and 9) and graphic designers (topic 11) (Fig. 1a), bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

375 presumably because this research has implications for more realistic computational rendering of

376 phiological popeie. Engineeing Bain Paaie fo Inacellla Delie of Theapeic

377 Poein, b Bacha e al., [41]. included non-academic audience sectors primarily associated

378 ih bioechnolog companie (opic 2, 8, 9, and 10), aligning ih he pape foc on

379 bioengineering, as well as an audience sector associated with news media (topic 3), suggesting

380 this preprint caught the attention of science journalists (though we note that at the time of

381 writing, Altmetric has not indexed any news media coverage which cites this preprint) (Fig. 1b).

382 Se diffeence in gene epeion in he hman feal bain, b OBien et al. [42] had a

383 particularly diverse array of non-academic audience sectors, including groups associated with

384 blockchain technology and cryptocurrency (topic 2), veganism and animal rights (topic 3), video

385 games (topic 4), right-wing politics (topics 5, 6, and 7), and science fiction & fantasy writers

386 (topic 10). Other audience sectors captured groups of individuals that shared a common

387 language, specifically Arabic (topic 8) and Spanish (topic 11 & 12), indicating this preprint had a

388 culturally and geographically diverse audience (Fig. 1c). Ho face pecepion nfold oe

389 ime, b Dob e al. [43] included non-academic audience sectors associated with right-wing

390 politics (topics 1, 2, and 12), Spanish-speaking communities (topics 3, 6, and 9), business and

391 marketing (topics 4 and 10), mental health professionals and advocates (topic 7), and veganism

392 and animal rights (topic 11) (Fig. 1d).

393

394 Measuring engagement from political partisans

395 A eemplified b o anali of OBien e al. [42] and Dobs et al. [43] (Fig. 1c-d), many non-

396 academic audience sectors included keywords signaling political ideology/affiliation such as

397 epblican, coneaie, democa, o libeal; hahag ch a #MAGA and #KAG,

398 (aconm fo make Ameica gea again and keep Ameica gea, ed b Donald Trump

399 and hi popli ppoe hogho hi campaign and peidenc) and #ei (ed bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

400 primarily by democrats who critically oppose Donald Trump); and coded emoji such as (a

401 common mbol ed b libeal o ho ppo fo a ble ae of Democratic candidates in

402 the 2016 midterm elections) or ❌ (used by conservatives to signal their claim that right-wing

403 users have been unfairly subjected to censorship on social media [47]). The presence of these

404 words, hashtags, and emoji demonstrate that many non-academic audience sectors consist of

405 users who primarily self-identify through their political ideologies rather than professional

406 affiliation or personal/recreational interests, and thus might be especially interested in the

407 political implications of the research at hand.

408

409 To estimate how these audience sectors polarize on the political spectrum, we compared the

410 facion of e aociaed ih opic maked ih he hahag #ei (an indicator of left-of-

411 cene poliical affiliaion) o e aociaed ih opic maked ih he hahag #maga (an

412 indicator of right-of-center political affiliation) [48]. The political polarization of these audience

413 sectors tended to vary by research areatweets referencing preprints in ecology, immunology,

414 scientific communication/education, systems biology, and bioinformatics came from users

415 whose follower networks generally skewed politically left, and tweets about preprints in genetics,

416 genomics, neuroscience, and animal behavior/cognition (and, to a lesser extent, cell biology,

417 cancer biology, and plant biology) generally came from users whose follower networks

418 significantly skewed to the political right (Fig. 3). The remaining subject areas (e.g.,

419 bioengineering, biophysics, microbiology, molecular biology) showed no significant skew in

420 political orientation.

421 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

422

423 Fig. 3. Political skew of non-academic audience sectors by bioRxiv category. The x-axis

424 shows the log10 fold difference between the fractions of right-wing audience sectors (marked

425 ih he hahag #maga) ad lef-ig adiece ec (maked ih he hahag #ei)

426 among all tweets referencing preprints in a given bioRxiv category. The y-axis shows the -log10

427 p-value of an exact binomial test for whether these fractions are equal. Preprint categories with

428 statistically significant differences (after Bonferroni multiple testing correction) are annotated

429 above the dashed line. Preprints with audiences that skew politically left are shown in the blue

430 shaded area, and preprints with audiences that skew politically right are shown in the red

431 shaded area. The size of each point indicates the total number of users affiliated with political

432 audience sectors for that category.

433

434 Quantifying engagement with biological research by white nationalists

435 In addition, we found that many of the audience sectors determined to associate with right-wing

436 political constituencies were further associated with far-right political ideologies: 1529 of the

437 tweets analyzed were associated with audience topics that inclded he keod naionali in bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

438 the top 50 keywords. Although users classified into these audience sectors account for only

439 0.24% of the 331,696 tweets analyzed, these sectors were detected in only six bioRxiv

440 categories: animal behavior and cognition, bioinformatics, evolutionary biology, genetics,

441 genomics, and neuroscience. This enrichment was strongest in the categories of animal

442 behavior and cognition and genetics, where audience sectors explicitly associated with

443 nationalism accounted for 1.53% and 1.51% of the total audiences, respectively, equivalent to a

444 >6-fold enrichment of nationalist-associated audience sectors when compared to the overall

445 average across all preprints analyzed.

446

447 To better quantify the extent of this engagement and confirm that this result was not an artifact

448 of the LDA model we used, we applied an orthogonal analysis of political affiliation by examining

449 patterns of network homophily (i.e., the number of mutual connections shared [27]) between

450 each user that tweeted about a preprint and a curated reference panel of 20 prominent white

451 nationalist accounts on Twitter. Across the 331,696 tweets analyzed, the median network

452 homophily of individual users was 0.1%, with 95% of users exhibiting homophily levels of <1%.

453 Thus, we considered any user with far-right network homophily >2% (i.e., at least a 2-fold

454 increase in far-right network homophily compared to 95% of users in the dataset) to be affiliated

455 with this network. We emphasize that this is a strictly quantitative affiliation, and we make no

456 claims about the interpersonal or ideological association between an individual user and white

457 nationalist communities on Twitter, simply that they share substantially more followers in

458 common with particular white nationalist Twitter users than the majority of users analyzed.

459

460 We next specified varying thresholds of white nationalist network homophily (2%, 5%, 10%, and

461 20%) and, for each preprint, counted the number of users whose median homophily with the

462 white nationalist reference panel exceeded each threshold (Fig. 4; Supplementary Fig. 1). As

463 a point of reference, we found that 1,286 (71.4%) of the preprints analyzed had audiences bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

464 where the fraction of users with >2% follower network homophily with the white nationalist

465 reference panel was negligible (<1% of tweets referencing the preprint), indicating that

466 engagement from white nationalist-affiliated users cannot be simply explained as a normative

467 aspect of any research discussed on Twitter.

468

469 Many of the remaining 514 preprints had an even stronger exposure to white nationalist-

470 affiliated audience sectors than indicated by the audience topics alone: 182 of these (10.1% of

471 all preprints analyzed) had >5% of users each with >2% white nationalist network homophily

472 with the reference panel, and 14 of these had >30% of users with >2% white nationalist network

473 homophily. Preprints with the strongest enrichment of such users generally occurred in bioRxiv

474 categories where we detected explicit white nationalist associated audience sectors (animal

475 behavior and cognition, evolutionary biology, genetics, genomics, and neuroscience), though we

476 note that we also detected at least one preprint where >5% of tweets came from users with >2%

477 white nationalist network homophily in 19 of the 27 categories (Fig. 4).

478

479 On average, the 182 preprints that showed a high level of engagement from white nationalist-

480 affiliated accounts (>5% of users, each with >2% far-right network homophily), received 215

481 tweets; this is significantly more than the 1,619 preprints that did not exceed this threshold,

482 which received 164 tweets on average (t-test, p=0.012). These 182 preprints also had

483 significantly higher Altmetric Attention Scores (mean Attention Score of 181 compared to 112 for

484 the remaining preprints; t-test, p=0.001), and the correlation between estimated non-academic

485 lay audience fraction and total number of tweets was higher for these 182 preprints (r=0.38

486 versus r=0.22). This suggests that preprints attracting white nationalist-affiliated communities

487 tend to receive more tweets, higher priority in Altmetric rankings, and proportionally less

488 engagement from academic audiences.

489 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

490

491 Fig. 4. Distributions of white nationalist homophily by category. Each point represents a

492 single preprint, and the position on the y-axis indicates the fraction of users who tweeted about

493 that preprint whose follower network homophily with the white nationalist reference panel is

494 greater than h=2%. Boxplots summarizing the distributions of these fractions per bioRxiv

495 category are shown beneath each set of points.

496

497 Discussion

498 Our analyses demonstrate that the audiences engaging with scientific research on Twitter can

499 be accurately classified into granular, informative categories across both academic and non-

500 academic commniie pel b eamining ho each e folloe elf-identify in their

501 Twitter biographies. This audience partitioning approach enables more accurate appraisal and bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

502 conealiaion of an aicle epoe o peciali and non-specialist audiences, reveals

503 patterns of engagement from political partisans that may reflect immediate policy implications of

504 contemporary research, and quantifies troubling trends of engagement with the scientific

505 literature from users affiliated with far-right white nationalist networks.

506

507 It is important to note that the results for a given preprint represent a single temporal snapshot

508 of its Twitter audienceadditional users may discover and tweet about the preprint, follower

509 networks are constantly shifting, and users may change their biographies, delete their tweets,

510 deactivate their accounts, or be suspended at any time. The constant evolution of social media

511 data may mean that the conclusions from our present study are only valid at a particular point in

512 time, so tracking these trends longitudinally might be necessary to separate signals of

513 ephemeral attention from evidence of long-term societal impacts. There are numerous other

514 untapped features of social media data that could be useful for parsing characteristics of a

515 pepin adience, ch a a enimen anali of he ee efeencing he pepin,

516 indexing of users who like or reply to tweets referencing the preprint, or the temporal patterns of

517 how referencing tweets accumulate retweets.

518

519 Our study focused exclusively on bioRxiv preprints due to the accessibility of article-level

520 metadata and breadth of research topics covered, but our audience segmentation strategy

521 could be extended to evaluate the social media audiences of research appearing in peer-

522 reviewed journals, too. For preprints that are eventually published in peer-reviewed journals, we

523 could even compare the audience sectors of the manuscript in both its preprint and peer-

524 reviewed forms to investigate how the peer review and editorial processes influence audience

525 composition. A priori, we may expect the pre- and post-publication audiences to be quite similar,

526 but many factors (press releases, journal prestige, major revisions made to the manuscript after

527 peer review, open-access status of the paper, etc.) could alter the balance of stakeholders bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

528 interested in a paper. An expansive survey of Twitter audiences for peer-reviewed publications

529 could help journals identify strengths and growth opportunities for their editorial/publishing

530 practices, assist authors in making informed decisions about where to submit and whether to

531 preprint their work, and guide readers towards research that is most relevant to their interests.

532

533 Anohe oce of daa ha i lagel ignoed in he almeic lieae i econda

534 engagemen eentweets that do not link directly to a research article, but instead cite news

535 articles, blog posts, and other online sources which reference or discuss the research. These

536 secondary engagement events are not indexed by altmetric data brokers, but, given that most

537 lay audiences learn about new scientific research through news media [8], it is likely that such

538 posts are heavily enriched for non-academic audiences and thus more informative of potential

539 ocieal impac. Fo eample, he pape Loci aociaed ih kin pigmenaion idenified in

540 African populaion [49] received 374 tweets from 323 users as of December 2019, according

541 to Altmetric. This study was covered by science journalists Ed Yong in The Atlantic [50] and Carl

542 Zimmer in [51] (in addition to over 40 other news outlets, according to

543 Altmetric). Tweets linking to news articles posted by Yong, Zimmer and their respective

544 publishers alone received over 700 retweets, far outnumbering the tweets that link directly to the

545 research article.

546

547 Scientists are the primary drivers of social media engagement

548 We estimate that academic-affiliated accounts comprise over half of the audience for over 95%

549 of the bioRxiv preprints analyzed, suggesting that most social media discussion of preprints

550 remains within the confines of the academic community. This stands in sharp contrast to

551 Almeic adience egmenaion eimae, hich imply that only about half of the preprints bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

552 analyzed have majority academic-affiliated audiences on Twitter. Our results suggest that

553 Altmetric tends to severely underestimate the engagement attributable to academic audiences.

554 We speculate that, on average, Altmetric misclassifies up to 40% of the Twitter audience for

555 bioRxiv preprints as members of the public, when these users should instead be classified as

556 scientists, science communicators, or practitioners. We do not claim our method is a perfect

557 classifier of academic audience sectors, but these discrepancies certainly highlight the need for

558 greater transparency in the methodologies used by commercial altmetric data brokers and

559 assurance that their altmetric indicators are situated in an informative, accurate context.

560

561 These results, coupled with the fact that scientists on Twitter are not immune to the effects of

562 network homophily (a recent paper estimated that faculty members in ecology and evolutionary

563 biology typically have a Twitter following comprised mostly [55%] of other scientists [14]), also

564 leads us to conclude that most discussions of bioRxiv preprints on social media are often simply

565 an online extension of the broader academic research ecosystem. This conclusion challenges a

566 recent study that estimated less than 10% of tweets referencing scientific publications originated

567 from accounts that conveyed curated, informed perspectives on the research, whereas most

568 tweets appeared to originate from automated bots or accounts the authors characterized as

569 dplicaie, mechanical, o deoid of oiginal hogh [11] (though this study only considered

570 papers in dentistry journals). To the contrary, our finding that scientists are the largest audience

571 sector would suggest that the opposite is true, and most tweets referencing bioRxiv preprints

572 represent curated, informed perspectives from subject matter experts. Although we cannot

573 guarantee that every scientist on Twitter is immune from mechanical tweeting or performative

574 aemp o bild ocial capial (colloiall knon a clo on Tie) ih hei pee, e ae

575 optimistic that most are doing so in an honest attempt to stay abreast of the latest work in their

576 fields and broadcast their informed opinions to their followers.

577 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

578 Nevertheless, the overwhelming ubiquity of academics in the audience for bioRxiv preprints

579 reflects an obvious downside: most preprints ultimately appear to be receiving limited

580 engagement from lay audiences. This presents an opportunity for motivated scientists to self-

581 audit their own network homophily, build a following beyond their professional bubble, and use

582 the platform for spreading scientific information to the broader public. If funding agencies and

583 policymakers continue to prioritize societal impact and altmetrics as a desirable outcome of

584 scientific research and demand evidence of impactful scholarship, it may be beneficial to

585 explicitly incentivize such public engagement and reward researchers who develop and

586 maintain a public-facing presence on social media.

587

588 Our analysis of Graving et al. provides a motivating example of how our social media audience

589 segmentation approach can help recognize potential downstream societal impacts of research

590 output and the efforts of individual researchers (Fig. 1a). This preprint introduced a software

591 program called DeepPoseKit that uses deep learning to understand the dynamics of how

592 animals sync and swarm together [40]. Of he 24% of Gaing e al. audience that we

593 classified as non-academic, certain sectors appeared to be associated with video game

594 developers and graphic designers, perhaps indicating this research has immediate economic

595 and cultural applications through the visual arts. Much of the total engagement (~270 retweets)

596 surrounding this preprint stemmed from a single tweet posted by the first author of the study,

597 Jacob Graving, who provided a brief summary along with an animated GIF showing how the

598 program tracks the movements of a swarm of locusts

599 (https://twitter.com/jgraving/status/1122043261777076224). Neither the preprint itself nor

600 Gaing ee allded o an popec of economic, clal, enionmenal, o ocial

601 applications, and both focused solely on the ethological aspects of their software (though we

602 note that this study discloses that they received unrestricted funding from graphics card

603 manufacturer Nvidia, which may partially explain why this preprint was of interest to video game bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

604 developers). Our audience segmentation of this preprint demonstrates that researchers are fully

605 capable of capturing the attention of lay audiences simply by maintaining a presence on Twitter

606 and creatively communicating their work.

607

608 Uncovering patterns of political engagement

609 Policy implications are frequently discussed as an important example of broader societal

610 impacts that can result from scientific research. Twitter is widely used for political discourse, so

611 naturally our analyses identified many audience sectors marked with overtly left-wing or right-

612 wing political constituencies that were engaging with preprints. Intriguingly, the fields that

613 attracted politically-oriented audience sectors did not typically receive equal bipartisan attention

614 (as we might expect for research topics that have become battlegrounds of political

615 disagreement, such as climate change [52], or fields that typically transcend political

616 boundaries, such as translational biomedicine [53]). Instead, when politically oriented lay

617 audiences were present, they tended to polarize very strongly towards one end of the political

618 spectrum or the other.

619

620 Two categories of bioRxiv preprints stood out as attracting disproportionately left-leaning lay

621 audiences: ecology and scientific communication and education. Many of the ecology preprints

622 we analyzed dealt with aspects of climate change, a topic that receives far more positive

623 attention and support from left-leaning political platforms [53]. Similarly, the scientific

624 communication and education category includes several preprints that address issues of equity,

625 diversity, and inclusion in academic environments, which are also a prominent feature of left-

626 wing politics. Preprints in genetics, neuroscience, and animal behavior and cognition attracted a

627 disproportionately stronger presence from right-leaning lay audiences. These preprints often bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

628 involved research pertaining to human population history and the genetic and neurological

629 architecture and evolution of sociobehavioral traits, suggesting such research is seen by these

630 audience sectors as especially relevant to right-wing political ideologies. A cursory examination

631 of tweets referencing these preprints indicates these right-wing lay audiences generally view

632 this research through a positive lens. For example, a conservative political scientist with over

633 80,000 folloe eeed a efeence o Geneic Aociaion ih Mahemaic Tacking and

634 Peience in Seconda School b Haden et al. [54] and interpreted the conclusions of the

635 paper as follows:

636

637 Want an example of how PGS [polygenic scores] can inform policy issues? Voila. [sic]

638 (https://twitter.com/charlesmurray/status/1114536610266267649).

639

640 We strongly emphasize that the authors of this particular preprint (or any other, for that matter)

641 do not necessarily endorse the interpretations of audiences that find it interesting and relevant,

642 but this is a concrete example of how easily research can be appropriated in arguments for or

643 against specific policies.

644

645 We also note that several preprints in a variety of other research areas piqued the interest of

646 right-ing adience eco aociaed ih a popla conpiac heo knon a QAnon ha

647 originated on the 4chan and 8chan message boards [55]. Preprints that attracted these

648 conspiracy theorist audience sectors included a study of the relationship between cell phone

649 radiation and carcinogenesis in rats [56], eidence ha he 2016 onic aack on US

650 diplomats in Cuba may have actually been the calling song of the Indies short-tailed cricket [57],

651 an epidemiological analysis of how pandemics spread [58], and a clinical trial exploring the

652 effects of the ketosis diet on managing type 2 diabetes [59].

653 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

654 Our findings of politically oriented audience sectors (particularly in the field of genetics) may

655 trace a broader trend of emerging genetic technologies and concepts being exploited for

656 political advantage. In January, 2020, the administration of Republican president Donald Trump,

657 who campaigned on a vehemently anti-immigration platform, enacted legislation that allows

658 border securi official o collec DNA fom an peon in CBP [Com and Bode

659 Protection] custody who is subject to fingerprinting...including aliens as well as U.S. citizens and

660 Lawful Permanent Residents" [60]. In July, 2019, Israeli Prime Minister Benjamin Netanyahu

661 cited an ancient DNA study out of context to claim that ethnic Jewish populations had settled in

662 Palestine much earlier than the ancestors of modern-day Palestinians [61]. In 2018, Democratic

663 U.S. Senator Elizabeth Warren publicly released the results of her genetic ancestry test in

664 response to skepticism about her claims of having Native American ancestors [62].

665 Systematically studying how supporters and opponents of these politicians respond to such

666 actions may lend further insight into how politically motivated sectors of the broader public either

667 endorse or reject the politicization of genetics research.

668

669 The white elephant in the room

670 Upon fhe ineigaion of he adience eco e defined a igh-ing, man ch

671 sectors were also represented by keywords indicative of extreme ideologies, such as white

672 nationalism (in contrast, we found no indications of far-left ideologies [e.g., communism]

673 peading he adience eco e defined a lef-ing). The accon affiliaed ih hee

674 audience sectors typically exhibited unusually high levels of network homophily with prominent

675 white nationalists on Twitter, confirming that our audience segmentation was not simply the

676 result of data artifacts in our topic model. Our findings are consistent with qualitative

677 observations made by many journalists and scientists that members and affiliates of white bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

678 nationalist movements are voracious consumers of scientific research and have attempted to

679 reignite public interest in scientific racism to promote their ideology [35,36,63,64]. Within the last

680 two years, multiple scientific societies have released statements denouncing the

681 misappropriation of scientific research to justify racism and promote such ideologies, and have

682 encouraged their members to actively participate in debunking science denialism and

683 addressing the negative impacts of their research [34,65,66].

684

685 Although these news articles and society position statements shed light on the emerging trend

686 of white nationalist misappropriation of science, they do not offer much sense of the quantitative

687 scope of the issue. Our study provides conclusive quantitative evidence that white nationalists

688 and adjacent communities are engaging with the scientific literature on Twitter. Not only are

689 these communities a ubiquitous presence in the social media audience for certain research

690 topics, but they can dominate the discourse around a particular preprint and inflate altmetric

691 indicators. We must strongly emphasize that we do not claim any particular user found to be

692 associated with audience topics pertaining to white nationalism and/or exhibiting higher than

693 usual levels of network homophily with white nationalists is ideologically associated with such

694 movements, merely that a nontrivial fraction of their Twitter followers likely are.

695

696 Naturally, these results elicit questions about how scientists should respond. According to a

697 recent news report, many scientists who study socially and politically sensitive topics have

698 expressed that they are reluctant to confront politicized misappropriation/misinterpretation of

699 their work or avoid doing so because they feel incapable of successfully communicating the

700 complexities of their research to non-expert audiences [35]. However, recent research has

701 demonstrated that the reach of science denialism is actually amplified when subject matter

702 experts and advocates do not intervene, but the spread of science denialism was significantly bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

703 attenuated when experts and advocates responded with factual information or addressed the

704 faulty rhetorical techniques of denialists [67].

705

706 Even so, the tactics proven to be effective at stemming the spread of science denialism may not

707 translate well to the task of stopping the spread of science misappropriation. As noted by

708 Panofsky and Donovan [64], white nationalistsunlike right-wing deniers of the reality of climate

709 change and vaccine efficacy [52,68,69]are not necessarily filtering scientific information

710 through a denialist mindset or extreme misinterpretations, but rather by processing through

711 aci cogniion. Panofk and Donoan go on o conclde ha challenging aci pblic

712 understanding of science is not simply a matter of more education or nuance, but may require

713 scientists to rethink their research paradigms and reflexively interrogate their own knowledge

714 podcion [64]. We anticipate the results of our study will motivate researchers to engage with

715 these uncomfortable yet unavoidable challenges of scientific inquiry and communication.

716

717 Materials and Methods

718 Data collection

719 Using the Rxivist API [37], we collected metadata for 1,800 bioRxiv preprints, considering any

720 preprints ranked within the top 1,000 by total number of downloads or total number of tweets

721 that received 50 or more tweets from unique Twitter users. Note that this is not an exhaustive

722 collection of the most highly-tweeted preprints, as preprints posted prior to February, 2017 were

723 excluded from the rankings by tweet count (we expect that many of these older, highly-tweeted

724 preprints were captured in the ranking of the top 1,000 preprints by download count, but there

725 are likely some older preprints that received >50 tweets but few downloads). bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

726

727 For each preprint, we queried the Crossref Event Data API for all documented tweets and

728 retweets that referenced that paper/preprint. In instances where tweets referencing a paper

729 were not indexed in the Crossref event database, we used the rvest R package to scrape

730 eialen infomaion fom he pepin Almeic page. Specifically, we collected 1) the handle

731 of the user, 2) the timestamp at which they (re)tweeted the article, 3) the text of the (re)tweet,

732 and 4) if the event was a retweet, the user who originally posted the tweet. Tweets from private

733 users or users with fewer than 5 followers were excluded. We used the tweetscores R package

734 [27] to query the Twitter API for the follower metadata (account handles and biographies of up

735 to 10,000 followers per user) of each of the N unique users that (re)tweeted a given preprint.

736 Due to the Twitter developer agreement, we are unable to provide the raw data used in these

737 analyses; the R code used to scrape the data is provided at http://github.com/carjed/audiences.

738 Topic modeling

739 For each of the N users that (re)tweeted a reference to a given preprint, we concatenated the

740 biogaphie and ceen name of hei folloe ino a ingle docmen, epeening a li of

741 all he od conained in he Tie biogaphie/ceen name of ha e followers. We

742 cleaned each docmen o emoe pncaion and common opod (e.g., a, he, i)

743 from 13 languages using the tm R package [70]. We translated emoji that occurred in the

744 follower documents into a unique single word string according to the official emoji shortcode,

745 aken fom hp://emojipedia.og, and pepended ih he ing emoji o ceae an

746 alphanumeric word, e.g., the microscope emoji ( ) a anlaed o emojimicocope.

747 Similal, e anlaed hahag o eplace he # mbol ih hahag, e.g.,#micocope

748 a anlaed o hahagmicocope. Fo he W unique words observed across all the N bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

749 follower documents, we then generated an NxW document term matrix, enumerating the

750 frequencies of each word in each follower document.

751

752 We used the lda R package to apply a Latent Dirichlet Allocation (LDA) model to this document

753 term matrix, representing each of the N documents as a mixture of K discrete topics (where

754 K<

755 of relevant hyperparameters: �=1,...,,=1,..., indicating the probability of topic k occurring in the

756 follower document for user i, and �=1,...,,=1,..., indicating the probability of word w occurring

757 in topic k. Each topic is thus characterized by words that frequently co-occur, and each follower

758 document is summarized as a set of dosages corresponding to the probabilistic topic

759 assignments.

760 Estimation of academic audience fractions

761 For a given preprint, we estimated the fraction of the audience that were academics by first

762 flagging topics containing keywords we determined to correspond to academic

763 careers/environments (e.g., niei, phd, podoc, pofeo, fello). We hen

764 summed the theta parameters of these topics to arrive at our estimate. Formally, this is stated

765 as:

766 ��� ∑=1 ������� ∋ "����������", "�ℎ�", "�������", "���������", . . .

767 The fraction of the audience assumed to be non-academic is thus:

768 �−�� 1 ��� bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

769 Mapping inferred academic audience sectors to specific academic

770 disciplines

771 For each audience sector inferred to correspond to an academic audience, we calculated cosine

772 imilai coe beeen ha adience eco op 50 aociaed keod (�,) and the 50

773 most frequent words found in each of the Wikipedia articles for over 1,000 academic disciplines,

774 indexed at https://en.wikipedia.org/wiki/Outline_of_academic_disciplines. The title of the

775 Wikipedia article found to have the highest similarity score was assigned as the best-mapping

776 discipline of that academic audience sector.

777 Network homophily analysis

778 We curated a reference set of 20 organizations and individuals associated with far-right white

779 nationalist ideologies. For each of these accounts, we then scraped their list of followers. Then

780 fo each pepin analed, e calclaed he facion of each efeencing e folloe ha

781 were also following each of these 20 accounts, taking the median of these 20 scores as an

782 eimae of ha e network homophily with the reference panel. We summarized these

783 individual-level homophily scores on a per-preprint basis by calculating the fraction of individuals

784 with homophily score >h, varying h at four different stringency thresholds {2%, 5%, 10%, and

785 20%}.

786

787 Acknowledgments

788 We thank Emma Beyers-Carlson, Graham Coop, Doc Edge, Jeffrey Ross-Ibarra, and Josh

789 Schraiber for their helpful feedback on the manuscript and online resources. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

790 References

791 1. Van Noorden R. Metrics: A profusion of measures. . 2010;465: 864866.

792 doi:10.1038/465864a

793 2. Hutchins BI, Yuan X, Anderson JM, Santangelo GM. Relative Citation Ratio (RCR): A New

794 Metric That Uses Citation Rates to Measure Influence at the Article Level. PLoS Biol.

795 2016;14: e1002541. doi:10.1371/journal.pbio.1002541

796 3. Mingers J, Leydesdorff L. A review of theory and practice in scientometrics. Eur J Oper

797 Res. 2015;246: 119. doi:10.1016/j.ejor.2015.04.002

798 4. Bornmann L. What is societal impact of research and how can it be assessed? a literature

799 survey. J Am Soc Inf Sci Technol. 2013;64: 217233. doi:10.1002/asi.22803

800 5. Main BR. The Reeach Ecellence Fameok and he impac agenda: ae e ceaing

801 a Frankenstein monster? Res Eval. 2011;20: 247254.

802 doi:10.3152/095820211X13118583635693

803 6. GPG Chapter III. [cited 16 Dec 2019]. Available:

804 https://www.nsf.gov/pubs/policydocs/pappguide/nsf13001/gpg_3.jsp

805 7. Robinson-Garcia N, van Leeuwen TN, Ràfols I. Using altmetrics for contextualised mapping

806 of societal impact: From hits to networks. Sci Public Policy. 2018;45: 815826.

807 doi:10.1093/scipol/scy024

808 8. How Scientists Engage the Public. In: Pew Research Center Science & Society [Internet].

809 15 Feb 2015 [cited 5 Nov 2019]. Available:

810 https://www.pewresearch.org/science/2015/02/15/how-scientists-engage-public/ bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

811 9. To tweet or not to tweet? In: Science | AAAS [Internet]. 20 Nov 2014 [cited 10 Dec 2019].

812 Available: https://www.sciencemag.org/careers/2014/10/tweet-or-not-tweet

813 10. Barnes C. The Use of Altmetrics as a Tool for Measuring Research Impact. Australian

814 Academic & Research Libraries. 2015;46: 121134. doi:10.1080/00048623.2014.1003174

815 11. Robinson-Garcia N, Costas R, Isett K, Melkers J, Hicks D. The unbearable emptiness of

816 tweeting-About journal articles. PLoS One. 2017;12: e0183551.

817 doi:10.1371/journal.pone.0183551

818 12. Mohammadi E, Thelwall M, Kwasny M, Holmes KL. Academic information on Twitter: A

819 user survey. PLoS One. 2018;13: e0197265. doi:10.1371/journal.pone.0197265

820 13. Eysenbach G. Can tweets predict citations? Metrics of social impact based on Twitter and

821 correlation with traditional metrics of scientific impact. J Med Internet Res. 2011;13: e123.

822 doi:10.2196/jmir.2012

823 14. Côté IM, Darling ES. Scientists on Twitter: Preaching to the choir or singing from the

824 rooftops? Heard SB, editor. FACETS. 2018;3: 682694. doi:10.1139/facets-2018-0002

825 15. Parsons ECM, Shiffman DS, Darling ES, Spillman N, Wright AJ. How Twitter literacy can

826 benefit conservation scientists. Conserv Biol. 2014;28: 299301. doi:10.1111/cobi.12226

827 16. Dinsmore A, Allen L, Dolby K. Alternative perspectives on impact: the potential of ALMs

828 and altmetrics to inform funders about research impact. PLoS Biol. 2014;12: e1002003.

829 doi:10.1371/journal.pbio.1002003

830 17. altmetrics: a manifesto altmetrics.org. [cited 4 Dec 2019]. Available:

831 http://altmetrics.org/manifesto/

832 18. Twitter in scholarly communication. In: Altmetric [Internet]. 12 Jun 2018 [cited 22 May bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

833 2019]. Available: https://www.altmetric.com/blog/twitter-in-scholarly-communication/

834 19. The Altmetric Top 100 2019. In: Altmetric [Internet]. [cited 3 Jan 2020]. Available:

835 https://www.altmetric.com/top100/2019/

836 20. Coa R, Zahedi Z, Woe P. Do almeic coelae ih ciaion? Eenie

837 comparison of altmetric indicators with citations from a multidisciplinary perspective: Do

838 Almeic Coelae Wih Ciaions? J Assn Inf Sci Tec. 2015;66: 20032019.

839 doi:10.1002/asi.23309

840 21. How are Twitter demographics determined? : Altmetric Support. [cited 18 Jun 2019].

841 Available: https://help.altmetric.com/support/solutions/articles/6000060978-how-are-twitter-

842 demographics-determined-

843 22. No e if cieni o j Tie bo O: Who ee abo cholal pape. In: Almeic

844 [Internet]. 12 Jul 2018 [cited 22 May 2019]. Available: https://www.altmetric.com/blog/not-

845 sure-if-scientist-or-just-twitter-bot-or-who-tweets-about-scholarly-papers/

846 23. McPherson M, Smith-Lovin L, Cook JM. Birds of a Feather: Homophily in Social Networks.

847 Annu Rev Sociol. 2001;27: 415444. doi:10.1146/annurev.soc.27.1.415

848 24. Kang J-H, Lerman K. Using Lists to Measure Homophily on Twitter. Workshops at the

849 Twenty-Sixth AAAI Conference on Artificial Intelligence. 2012. Available:

850 https://www.aaai.org/ocs/index.php/WS/AAAIW12/paper/viewPaper/5425/5628

851 25. Al Zamal F, Liu W, Ruths D. Homophily and Latent Attribute Inference: Inferring Latent

852 Attributes of Twitter Users from Neighbors. Sixth International AAAI Conference on

853 Weblogs and Social Media. 2012. Available:

854 https://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/viewPaper/4713 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

855 26. Halberstam Y, Knight B. Homophily, Group Size, and the Diffusion of Political Information in

856 Social Networks: Evidence from Twitter. National Bureau of Economic Research; 2014.

857 doi:10.3386/w20681

858 27. Barberá P. Birds of the Same Feather Tweet Together: Bayesian Ideal Point Estimation

859 Using Twitter Data. Polit Anal. 2015;23: 7691. doi:10.1093/pan/mpu011

860 28. Caetano JA, Lima HS, Santos MF, Marques-Neto HT. Using sentiment analysis to define

861 ie poliical e clae and hei homophil ding he 2016 Ameican peidenial

862 election. Journal of Internet Services and Applications. 2018;9: 18. doi:10.1186/s13174-

863 018-0089-0

864 29. Vaccari C, Valeriani A, Barberá P, Jost JT, Nagler J, Tucker JA. Of Echo Chambers and

865 Contrarian Clubs: Exposure to Political Disagreement Among German and Italian Users of

866 Twitter. Social Media + Society. 2016;2: 2056305116664221.

867 doi:10.1177/2056305116664221

868 30. Díaz-Fae AA, Boman TD, Coa R. Toad a econd geneaion of ocial media

869 meic: Chaaceiing Tie commniie of aenion aond cience. PLoS One.

870 2019;14: e0216408. doi:10.1371/journal.pone.0216408

871 31. Pennacchiotti M, Popescu AM. A machine learning approach to twitter user classification.

872 AAAI Confeence on Weblog and Social . 2011. Aailable:

873 https://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/viewPaper/2886

874 32. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus

875 genotype data. Genetics. 2000;155: 945959. Available:

876 https://www.ncbi.nlm.nih.gov/pubmed/10835412

877 33. Fei-Fei L, Perona P. A Bayesian hierarchical model for learning natural scene categories. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

878 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition

879 (CVPR05). 2005. pp. 524531 vol. 2. doi:10.1109/CVPR.2005.16

880 34. ASHG Denounces Attempts to Link Genetics and Racial Supremacy. Am J Hum Genet.

881 2018;103: 636. doi:10.1016/j.ajhg.2018.10.011

882 35. Harmon A. Why White Supremacists Are Chugging Milk (and Why Geneticists Are

883 Alarmed). The New York Times. 17 Oct 2018. Available:

884 https://www.nytimes.com/2018/10/17/us/white-supremacists-science-dna.html. Accessed

885 31 Oct 2019.

886 36. ewanbirney. Race, genetics and pseudoscience: an explainer - Ean Blog:

887 Bioinfomaician a lage. In: Ean Blog: Bioinfomaician a lage [Inene]. 24 Oc 2019

888 [cited 31 Oct 2019]. Available: http://ewanbirney.com/2019/10/race-genetics-and-

889 pseudoscience-an-explainer.html

890 37. Abdill RJ, Blekhman R. Tracking the popularity and outcomes of all bioRxiv preprints. Elife.

891 2019;8. doi:10.7554/eLife.45133

892 38. Novembre J. Pritchard, Stephens, and Donnelly on Population Structure. Genetics.

893 2016;204: 391393. doi:10.1534/genetics.116.195164

894 39. Lawson DJ, van Dorp L, Falush D. A tutorial on how not to over-interpret STRUCTURE and

895 ADMIXTURE bar plots. Nat Commun. 2018;9: 3258. doi:10.1038/s41467-018-05257-7

896 40. Graving JM, Chae D, Naik H, Li L, Koger B, Costelloe BR, et al. DeepPoseKit, a software

897 toolkit for fast and robust animal pose estimation using deep learning. bioRxiv. 2019. p.

898 620245. doi:10.1101/620245

899 41. Bracha S, Hassi K, Ross PD, Cobb S, Sheiner L, Rechavi O. Engineering Brain Parasites bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

900 for Intracellular Delivery of Therapeutic Proteins. bioRxiv. 2018. p. 481192.

901 doi:10.1101/481192

902 42. OBien HE, Hannon E, Jeffie AR, Daie W, Hill MJ, Anne RJ, e al. Se diffeence in

903 gene expression in the human fetal brain. bioRxiv. 2019. p. 483636. doi:10.1101/483636

904 43. Dobs K, Isik L, Pantazis D, Kanwisher N. How face perception unfolds over time. bioRxiv.

905 2018. p. 442194. doi:10.1101/442194

906 44. Ke Q, Ahn Y-Y, Sugimoto CR. A systematic identification and analysis of scientists on

907 Twitter. PLoS One. 2017;12: e0175368. doi:10.1371/journal.pone.0175368

908 45. Rubinstein JL, Guo H, Ripstein ZA, Haydaroglu A, Au A, Yip CM, et al. Shake-it-off: A

909 simple ultrasonic cryo-EM specimen preparation device. bioRxiv. 2019. p. 632125.

910 doi:10.1101/632125

911 46. Cooper S, Khatib F, Treuille A, Barbero J, Lee J, Beenen M, et al. Predicting protein

912 structures with a multiplayer online game. Nature. 2010;466: 756760.

913 doi:10.1038/nature09304

914 47. Somme W. Wh a Red X I he Ne Smbol of Coneaie Tie. In: The Dail Bea

915 [Internet]. The Daily Beast; 10 Aug 2018 [cited 30 Aug 2019]. Available:

916 https://www.thedailybeast.com/why-a-red-x-is-the-new-symbol-of-conservative-twitter

917 48. 2. An analysis of #BlackLivesMatter and other Twitter hashtags related to political or social

918 issues. In: Pew Research Center: Internet, Science & Tech [Internet]. 11 Jul 2018 [cited 29

919 Oct 2019]. Available: https://www.pewresearch.org/internet/2018/07/11/an-analysis-of-

920 blacklivesmatter-and-other-twitter-hashtags-related-to-political-or-social-issues/

921 49. Crawford NG, Kelly DE, Hansen MEB, Beltrame MH, Fan S, Bowman SL, et al. Loci bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

922 associated with skin pigmentation identified in African populations. Science. 2017;358.

923 doi:10.1126/science.aan8433

924 50. Yong E. The Ancient Origins of Both Light and Dark Skin. The Atlantic. 12 Oct 2017.

925 Available: https://www.theatlantic.com/science/archive/2017/10/a-brief-history-of-the-genes-

926 that-color-our-skin/542694/. Accessed 5 Nov 2019.

927 51. Zimmer C. Genes for Skin Color Rebut Dated Notions of Race, Researchers Say. The New

928 York Times. 12 Oct 2017. Available: https://www.nytimes.com/2017/10/12/science/skin-

929 color-race.html. Accessed 27 Jan 2020.

930 52. Lockwood M. Right-wing populism and the climate change agenda: exploring the linkages.

931 Env Polit. 2018;27: 712732. doi:10.1080/09644016.2018.1458411

932 53. Politics and Science: What Americans Think. In: Pew Research Center Science & Society

933 [Internet]. 1 Jul 2015 [cited 16 Jan 2020]. Available:

934 https://www.pewresearch.org/science/2015/07/01/americans-politics-and-science-issues/

935 54. Paige Harden K, Domingue BW, Belsky DW, Boardman JD, Crosnoe R, Malanchini M, et

936 al. Genetic Associations with Mathematics Tracking and Persistence in Secondary School.

937 bioRxiv. 2019. p. 598532. doi:10.1101/598532

938 55. McQuade M, Zuckerman E, LeJeune L. QAnon and the Emergence of the Unreal. Journal

939 of Design and Science. 2019. doi:10.21428/7808da6b.6b8a82b9

940 56. Wyde M, Cesta M, Blystone C, Elmore S, Foster P, Hooth M, et al. Report of Partial

941 findings from the National Toxicology Program Carcinogenesis Studies of Cell Phone

942 Radiofrequency Radiation in Hsd: Sprague Dawley® SD rats (Whole Body Exposures).

943 bioRxiv. 2018. p. 055699. doi:10.1101/055699 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

944 57. Stubbs AL, Montealegre-Z F. Recoding of onic aack on U.S. diploma in Cba

945 spectrally matches the echoing call of a Caribbean cricket. bioRxiv. 2019. p. 510834.

946 doi:10.1101/510834

947 58. Thompson RN, Thompson CP, Pelerman O, Gupta S, Obolski U. Increased frequency of

948 travel in the presence of cross-immunity may act to decrease the chance of a global

949 pandemic. bioRxiv. 2019. p. 404871. doi:10.1101/404871

950 59. Athinarayanan SJ, Adams RN, Hallberg SJ, McKenzie AL, Bhanpuri NH, Campbell WW, et

951 al. Long-Term Effects of a Novel Continuous Remote Care Intervention Including Nutritional

952 Ketosis for the Management of Type 2 Diabetes: A 2-year Non-randomized Clinical Trial.

953 bioRxiv. 2018. p. 476275. doi:10.1101/476275

954 60. Dickerson C. U.S. Government Plans to Collect DNA From Detained Immigrants. The New

955 York Times. 2 Oct 2019. Available: https://www.nytimes.com/2019/10/02/us/dna-testing-

956 immigrants.html. Accessed 7 Feb 2020.

957 61. Gannon M. When Ancient DNA Gets Politicized. In: Smithsonian Magazine [Internet].

958 Smithsonian Magazine; 12 Jul 2019 [cited 7 Feb 2020]. Available:

959 https://www.smithsonianmag.com/history/when-ancient-dna-gets-politicized-180972639/

960 62. Herndon AW. Elizabeth Warren Stands by DNA Test. But Around Her, Worries Abound.

961 The New York Times. 6 Dec 2018. Available:

962 https://www.nytimes.com/2018/12/06/us/politics/elizabeth-warren-dna-test-2020.html.

963 Accessed 7 Feb 2020.

964 63. I a oic place. Ho he online old of hie naionali dio poplaion genetics.

965 In: Science | AAAS [Internet]. 22 May 2018 [cited 16 Jan 2020]. Available:

966 https://www.sciencemag.org/news/2018/05/it-s-toxic-place-how-online-world-white- bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license.

967 nationalists-distorts-population-genetics

968 64. Panofk A, Donoan J. When Geneic Challenge a Raci Ideni: Geneic Ance

969 Testing among White Nationalists. SocArXiv; 2017. doi:10.17605/OSF.IO/7F9BC

970 65. ESHG joins ASHG in Denouncing Attempts to Link Genetics and the Concept of Racial

971 Supremacy. 22 Oct 2018 [cited 16 Jan 2020]. Available:

972 https://www.eshg.org/index.php?id=910&tx_news_pi1%5Bnews%5D=14&tx_news_pi1%5B

973 controller%5D=News&tx_news_pi1%5Baction%5D=detail&cHash=b1de68f27a9b8810a28c

974 91279e645e7c

975 66. AAPA Statement on Race & Racism. [cited 16 Jan 2020]. Available:

976 https://physanth.org/about/position-statements/aapa-statement-race-and-racism-2019/

977 67. Schmid P, Betsch C. Effective strategies for rebutting science denialism in public

978 discussions. Nat Hum Behav. 2019;3: 931939. doi:10.1038/s41562-019-0632-4

979 68. Gropp RE. Engaging Decision-Makers: Opportunities and Priorities for Biological Science

980 OrganizationsA report from the 2017 Meeting of the AIBS Council of Member Societies and

981 Organizations. Bioscience. 2018;68: 643648. doi:10.1093/biosci/biy082

982 69. Boseley S, Davies C, Chrisafis A, Grytsenko O, Giuffrida A. Rightwing populists ride wave

983 of mistrust of vaccine science. . 21 Dec 2018. Available:

984 http://www.theguardian.com/world/2018/dec/21/rightwing-populists-ride-wave-of-mistrust-

985 of-vaccine-science. Accessed 16 Jan 2020.

986 70. Feinerer I, Hornik K. tm: Text mining package. R package version 0 5-7 1. 2012;1.

987 Available: https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf

988