bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 1 Quantifying and contextualizing the impact of bioRxiv preprints through automated 2 social media audience segmentation 3 Jedidiah Carlson1*, Kelley Harris1,2 4 1Department of Genome Sciences, University of Washington, Seattle, WA 5 2Computational Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA 6 *Corresponding author:
[email protected] bioRxiv preprint doi: https://doi.org/10.1101/2020.03.06.981589. this version posted March 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-ND 4.0 International license. 7 Abstract 8 Engagement with scientific manuscripts is frequently facilitated by Twitter and other social 9 media platforms. As such, the demographics of a paper's social media audience provide a 10 wealth of information about how scholarly research is transmitted, consumed, and interpreted by 11 online communities. By paying attention to public perceptions of their publications, scientists can 12 learn whether their research is stimulating positive scholarly and public thought. They can also 13 become aware of potentially negative patterns of interest from groups that misinterpret their 14 work in harmful ways, either willfully or unintentionally, and devise strategies for altering their 15 messaging to mitigate these impacts. In this study, we collected 331,696 Twitter posts 16 referencing 1,800 highly tweeted bioRxiv preprints and leveraged topic modeling to infer the 17 characteristics of various communities engaging with each preprint on Twitter.