On the Infrastructure Providers That Support Misinformation Websites
Total Page:16
File Type:pdf, Size:1020Kb
On the Infrastructure Providers that Support Misinformation Websites Catherine Han Deepak Kumar Zakir Durumeric Stanford University Stanford University Stanford University [email protected] [email protected] [email protected] Abstract partners, and Distributed Denial of Service (DDoS) protec- tion providers. We specifically seek to: (1) identify the ser- In this paper, we analyze the service providers that vice providers that misinformation sites disproportionately power 440 misinformation sites, including hosting plat- forms, domain registrars, DDoS protection companies, rely on and (2) analyze whether the act of deplatforming advertising networks, donation processors, and e-mail misinformation websites affects their long-term availabil- providers. We find that several providers are dispropor- ity. To answer these questions, we crawl and analyze the tionately responsible for hosting misinformation web- network dependencies of 440 misinformation websites from sites compared to mainstream websites. Most promi- the OpenSources dataset (OpenSources 2017). We compare nently, Cloudflare offers DDoS protection to 34.3% these misinformation sites to a baseline of mainstream sites. of misinformation sites while servicing only 17.9% of We crawl each website and collect its DOM, cookies, and mainstream websites in our corpus. While many main- network requests, which we then augment with hosting and stream providers continue to service misinformation registrar data. To understand how misinformation websites sites, we show that when misinformation and other abu- monetize, we map third-party web dependencies to known sive websites are removed by hosting providers, DDoS protection services, and registrars, these sites nearly al- advertising providers and payment processors. ways resurface after finding alternative providers. More We show that misinformation sites disproportionately rely encouragingly, we show that misinformation sites also on several hosting providers, most prominently Cloudflare, disproportionately rely on popular ad networks and do- which serves content for 34.3% of misinformation sites nation processors, but that anecdotally, sites struggle to compared to 17.9% of mainstream sites. In manually in- remain online when mainstream monetization channels vestigating each misinformation website, we find that sites are severed. We conclude with insights for infrastruc- prefer Cloudflare because of its lax acceptable use poli- ture providers and researchers to consider for stopping the spread of online misinformation. cies and its free DDoS protection services that help protect against vigilante attacks. Misinformation websites also dis- proportionately rely on other mainstream providers includ- Introduction ing GoDaddy, Unified Layer, and Automattic because of Technical infrastructure providers like Amazon, Cloudflare, their WordPress offerings that allow users to quickly setup and Google both support and regulate websites that spread and scale sites without much technical expertise. By manu- misinformation. To counter the spread of misinformation, ally analyzing past deplatforming events, we find that when influential platforms have extended their service agreements major sites are deplatformed by mainstream providers, they to prohibit misinformation and, in extreme cases, termi- nearly always find new homes on alternative providers who nated service to violating sites (Cox 2021; Wong 2019; actively ignore site content, similar to how bullet-proof host- Infostormer 2019). For example, in 2017, the Daily Stormer ing providers are utilized by Internet attackers. lost its DDoS protection from Cloudflare and was sub- Next, we investigate monetization platforms like online sequently cut off from registrar providers GoDaddy and advertisement providers and payment processors that enable Google, resulting in a website hiatus (Burch 2017). Simi- revenue collection for misinformation sites. Most misinfor- larly, in 2021, the “far-right alternative to Twitter,” Parler, mation sites rely on mainstream ad networks like Google’s was knocked offline for a month after Apple and Google re- DoubleClick, but also disproportionately depend on other- moved Parler from their app stores and Amazon cut off Par- wise niche providers including RevContent and Outbrain. ler’s hosting services. Yet, despite their increasing role in Misinformation sites also disproportionately rely on dona- policing content, there has been little attention paid to iden- tions through PayPal and Patreon, as well as direct cryp- tifying the infrastructure providers that support misinforma- tocurrency donations. Despite little evidence showing that tion websites more broadly. deplatforming by hosting providers is effective at keeping In this paper, we investigate the technical infrastructure such sites offline, we note that anecdotally, websites cease to that powers misinformation websites, including domain reg- produce more misinformation content after they are deplat- istrars, web hosting and email providers, online advertising formed from both ad providers and payment processors. In other cases, sites have lamented the decrease in site revenue Misinformation Websites. We analyze two sets of misin- after being deplatformed from mainstream ad providers, in- formation websites. In this context, we deem misinforma- stead soliciting users for direct donations. tion to be non-satirical websites that have potentially mis- We conclude with a discussion of different strategies for leading content (e.g., “fake news”), determined by the Open- preventing the spread of misinformation based on our re- Sources project (OpenSources 2017). OpenSources pub- sults. We argue that while deplatforming sites from host- lishes lists of known, vetted, labeled misleading websites ing infrastructure is likely not an effective solution for curb- by analyzing sites across several axes: (1) domain name, ing the spread of misinformation, targeting site monetiza- (2) “About Us” page, (3) article source, (4) writing style, tion may be a promising approach for combating misin- (5) page and image aesthetic, and (6) social media network; formation. We hope that by shedding light on what has the set has been used extensively in prior research (Zeng, anecdotally been most effective, we encourage providers— Kohno, and Roesner 2020; Hounsel et al. 2020; Budak 2019; particularly those who have announced that they are com- Sharma et al. 2019). While the OpenSources master list mitted to fighting online abuse and misinformation—to fur- contains 826 websites, this list was published in 2017, and ther explore monetization as a critical channel for curbing because of this, many of these sites are unavailable today. the spread of misinformation at scale. Thus, we removed 191 unreachable sites and 123 parked do- mains (e.g., those that pointed back to a domain registrar). Related Work As our objective is to exclude satirical sites, two indepen- dent researchers then manually coded the remaining web- Our work is inspired by research that highlights the growing sites to identify 72 satirical websites, based on the “About complexities of the web. Prior work has studied how web- pages” of each website in question. Our final misinfor- sites have grown in complexity (Butkiewicz, Madhyastha, mation corpus contains 440 websites. The misinformation and Sekar 2011; Nikiforakis et al. 2012; Englehardt and corpus spans several flavors of unreliability, according to Narayanan 2016; Kumar et al. 2017) and are increasingly the labels provided by OpenSources. For example, some relying on centralized network entities and third-party con- sites consistently present extreme bias (e.g., breitbart.com), tent (Kumar et al. 2017). Beyond this, several studies have peddle conspiracy theories or bigoted propaganda (e.g., in- leveraged the nuances of technical infrastructure to better fowars.com, barenakedislam.com), or promote junk science understand and combat traditional computer abuse, includ- (e.g., naturalnews.com). ing spam and scams (Hao et al. 2009; 2016) and phish- Additionally, we analyzed 128 QAnon websites.Separate ing (Ho et al. 2017; 2019). two-sample proportion tests comparing the site resources In the context of misinformation, much work has fo- and hosting providers of our QAnon corpus to our misinfor- cused on the classification of websites, primarily through mation corpus indicate that the two sets of sites were drawn content analysis (Rashkin et al. 2017; Kumar, West, and from different distributions. As such, we analyze the two Leskovec 2016) or social graph features (Nguyen et al. 2020; data sets separately. Any results describe the OpenSources Jin et al. 2014; Popat et al. 2017; Shu, Wang, and Liu websites unless noted otherwise. 2019). Studies have leveraged infrastructure properties (e.g., HTTPS configuration or domain expiration) to classify mis- Mainstream Website Sample. We analyze 10K random information sites (Hounsel et al. 2020), but these studies do sites from the Alexa Top Million (Alexa Internet, Inc. 2020) not consider web resources broadly as features. as a control set. We analyze a random sample of websites rather than mainstream news sites because mainstream news Most closely aligned with our work are several recent sites differ significantly in structure compared to sites sim- studies of the web infrastructure components of misinforma- ilar in scale to those in our misinformation corpus. For tion sites. Zeng et al. investigated the ads and ad platforms instance, mainstream news sites are often extremely well- that