July 2019 Everything in Moderation

An Analysis of How Platforms Are Using Artifcial Intelligence to Moderate User- Generated Content

Spandana Singh

Last edited on July 15, 2019 at 10:21 a.m. EDT Acknowledgments

In addition to the many stakeholders across civil society and industry that have taken the time to talk to us over the years about our work on content moderation and transparency reporting, we would particularly like to thank Nathalie Maréchal from Ranking Digital Rights for her help in drafting this report. We would also like to thank Craig Newmark Philanthropies for its generous support of our work in this area.

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 2 intelligence-moderate-user-generated-content/ About the Author(s)

Spandana Singh is a policy program associate in New America's Open Technology Institute.

About New America

We are dedicated to renewing America by continuing the quest to realize our nation’s highest ideals, honestly confronting the challenges caused by rapid technological and social change, and seizing the opportunities those changes create.

About Open Technology Institute

OTI works at the intersection of technology and policy to ensure that every community has equitable access to digital technology and its benefits. We promote universal access to communications technologies that are both open and secure, using a multidisciplinary approach that brings together advocates, researchers, organizers, and innovators.

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 3 intelligence-moderate-user-generated-content/ Contents

Introduction 5

Legal Frameworks that Govern Online Expression 9

How Automated Tools are Used in the Content Moderation Process 12

The Limitations of Automated Tools in Content Moderation 17

Case Study: 22

Case Study: Reddit 26

Case Study: Tumblr 29

Promoting Fairness, Accountability, and Transparency Around Automated Content Moderation Practices 33

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 4 intelligence-moderate-user-generated-content/ Introduction

The proliferation of digital platforms that host and enable users to create and share user-generated content has signifcantly altered how we communicate with one another. In the 20th century, individual communication designed to reach a broad audience was largely expressed through formal media channels, such as newspapers. Content was produced and curated by professional journalists and editors, and dissemination relied on physically transporting physical artifacts like books or newsprint. As a result, communication during this period was expensive, slow, and, with some notable exceptions, easily attributed to an individual speaker. In the twenty-frst century, however, thanks to the expansion of the internet and , mass communication has become cheaper, faster, and sometimes difcult to trace.1

The widespread adoption and penetration of platforms such as YouTube, Facebook, and around the globe has signifcantly lowered the costs and barriers to communicating, thus democratizing speech online. Over the past decade, platforms have thrived of of users creating and exchanging their own content—whether it be family photographs, posts, or pieces of artwork—with speed and scale. However, in enabling user content production and dissemination, platforms also opened themselves up to unwanted forms of content, including hate speech, terror propaganda, harassment, and graphic violence. In this way, user-generated content has served as a key driver of growth for these platforms, as well as one of their greatest liabilities.2

In response to the growing prevalence of objectionable content on their platforms, technology companies have had to create and implement content policies and content moderation processes that aim to remove these forms of content, as well as accounts responsible for sharing this content, from their products and services. This is both because companies need to comply with legal frameworks that prohibit certain forms of content online, and because companies want to promote greater safety and positive user experiences on their services. In addition, in the context of the United States, this is because the First Amendment limits the extent to which the government can set the rules for what type of speech is permissible. Over the last few years, both large and small platforms that host user-generated content have come under increased pressure from governments and the public to remove objectionable content. In response, many companies have developed or adopted automated tools to enhance their content moderation practices, many of which are fueled by artifcial intelligence and machine learning. In addition to enabling the moderation of various types of content at scale, these automated tools aim to reduce the involvement of time- consuming human moderation.

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 5 intelligence-moderate-user-generated-content/ Over the last few years, both large and small platforms that host user-generated content have come under increased pressure from governments and the public to remove objectionable content.

However, the development and deployment of these automated tools has demonstrated a range of concerning weaknesses, including dataset and creator bias, inaccuracy, an inability to interpret context and understand the nuances of human speech, and a signifcant lack of transparency and accountability mechanisms around how these algorithmic decision-making procedures impact user expression. As a result, automated tools have the potential to impact human rights on a global scale, and efective safeguards are needed to ensure the protection of human rights.

This report is the frst in a series of four reports that will explore how automated tools are being used by major technology companies to shape the content we see and engage with online, and how internet platforms, policymakers, and researchers can promote greater fairness, accountability, and transparency around these algorithmic decision-making practices. This report focuses on automated content moderation policies and practices, and it uses case studies on three platforms—Facebook, Reddit, and Tumblr—to highlight the diferent ways automated tools can be deployed by technology companies to moderate content and the challenges associated with each of them.

Defning Content Moderation

Content moderation can be defned as the “governance mechanisms that structure participation in a community to facilitate cooperation and prevent abuse.”3 Currently, companies employ a range of approaches to content moderation, and they use a varied set of tools to enforce content policies and remove objectionable content and accounts. There are three primary approaches to content moderation:4

1. Manual content moderation: This approach, which typically relies on the hiring, training, and deployment of human moderators to review and make decisions on content cases, can take many forms. Large platforms tend to rely primarily on outsourced contract employees to complete this

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 6 intelligence-moderate-user-generated-content/ work. Small- to medium-size platforms tend to employ full-time, in-house moderators or rely on user moderators who volunteer to review content.

2. Automated content moderation: This approach involves the use of automated detection, fltering, and moderation tools to fag, separate, and remove particular pieces of content or accounts. Fully automated content detection and moderation practices are not widely used across all categories of objectionable content, as they have been found to lack accuracy and efectiveness for certain types of user speech. However, these tools are widely used for some types of objectionable content, such as child sexual abuse material (CSAM). In the case of CSAM, there is a clear international consensus that the content is illegal, there are clear parameters for what should be fagged and removed based on the law, and models have been trained on enough data to yield high levels of accuracy.

3. Hybrid content moderation: This approach incorporates elements of both the manual and automated approaches. Typically, this involves using automated tools to fag and prioritize specifc content cases to human reviewers, who then make the fnal judgment call on the case. This approach is being more widely adopted by both smaller and larger platforms, as it helps reduce the initial workload of human reviewers. Additionally, by letting a human make the fnal decision on a case, it comparatively limits the negative externalities that come from using automated tools for content moderation (e.g. accidental removal of content due to inaccurate tools or tools that cannot understand the nuances or context of human speech).

In addition, there are two diferent models of content moderation that are deployed by platforms, often depending on their size and capacity to engage in substantial content moderation practices.5

1. Centralized content moderation: This approach often involves a company establishing a broad set of content policies that they apply globally, with exceptions carved out to ensure compliance with laws in diferent jurisdictions. These content policies are enforced by a large group of moderators who are trained, managed, and directed in a centralized manner. The most common examples of companies who utilize this model are large internet platforms like Facebook and YouTube.

2. Decentralized content moderation: This approach often tasks users themselves with enforcing content policies. This can take diferent forms. In most cases, users are given an overarching set of global policies by a platform, which serve as a guiding framework. These companies also typically employ a small number of full-time content moderation staf to oversee general enforcement. The majority of the moderation, however,

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 7 intelligence-moderate-user-generated-content/ occurs in a decentralized manner. For example, on Reddit, user moderators are responsible for removing and regulating content in the same way that a moderator in a centralized model does. In addition, moderators on Reddit also have the power to create additional content guidelines for their particular domains.

Both models ofer benefts to platforms. For example, centralized models help platforms promote consistency in how they enforce their content policies, and they provide a clear starting point for creating and enforcing new policies. Decentralized models, on the other hand, enable more localized, culture- specifc, and context-specifc moderation to take place, fostering a diversity of viewpoints on a platform. Centralized models also create robust checkpoints for evaluating content. However, should this checkpoint be evaded, these platforms have few methods of then removing the evading content. In comparison, decentralized platforms ofer multiple levels of content evaluation and review.6

Finally, content moderation can take place at three diferent stages of the content lifecycle. These often involve competing pressure for promoting safety and security on platforms while also safeguarding free expression.7

1. Ex-Ante Content Moderation: Typically, when a user attempts to upload a photograph or video to a , it is screened before it is published. This moderation is mostly carried out through algorithmic screening and does not involve active human decision-makers. This form of content moderation is most commonly used to screen for CSAM or copyright-infringing material using tools such as PhotoDNA and ContentID. In these cases, there is typically no competing pressure between promoting safety and security and safeguarding free expression, as these clearly illegal forms of content do not have recognized free expression rights.

2. Ex-Post Proactive Content Moderation: As platforms have come under increased pressure to identify and remove objectionable forms of content such as terror propaganda, they have begun using automated tools to proactively search for and remove content and accounts in these domains.

3. Ex-Post Reactive Content Moderation: This form of content moderation takes place after a post has been published on a platform and subsequently fagged or reported for review by a user or entity such as an Internet Referral Unit or Trusted Flagger.8 On most platforms, content that has been fagged is typically processed and triaged by an automated system that then relays relevant content to human moderators for review.

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 8 intelligence-moderate-user-generated-content/ Legal Frameworks that Govern Online Expression

In order to efectively assess how content moderation practices and automated tools are shaping online speech, it is important to also understand the legal frameworks—both domestic and international—that underpin contemporary notions of freedom of expression online. Internet platforms such as Facebook, YouTube, and Twitter are popular not only in the United States, but across the globe. According to January 2019 statistics, 85 percent of Facebook’s daily active users are based outside the United States and Canada,9 and 80 percent of YouTube users10 and 79 percent of Twitter accounts11 are based outside the United States, with many of them residing in emerging markets such as India, Brazil, and Indonesia. However, despite the fact that the majority of these platforms’ users reside outside the United States, these companies are headquartered in the United States and are therefore primarily bound by U.S. laws. Under U.S. law, there are two principal legal frameworks that shape how we view freedom of expression online: The First Amendment to the U.S. Constitution and Section 230 of the Communications Decency Act.

In the United States, the First Amendment establishes the right to free speech for individuals and prevents the government from infringing on this right. Internet platforms, however, are not similarly bound by the First Amendment. As a result, they are able to establish their own content policies and codes of conduct that often restrict speech that could not be prohibited by the government under the First Amendment. For example, Facebook, and most recently Tumblr, prohibit the dissemination of adult content and graphic nudity on their platforms. Under the First Amendment, however, such speech prohibitions by the government would be unconstitutional. Section 230 of the Communications Decency Act is a statute that establishes intermediary liability protections related to user content in the United States.12 Under Section 230, web hosts, social media networks, website operators, and other intermediaries are, for the most part, shielded from being held liable for the content of their users’ speech. In addition, companies are able to moderate content on their platforms without being held liable. Such protections have enabled user-generated content-based platforms to grow and thrive without fear of being held liable for the content of their users’ posts. However, in 2018, an amended version of the Allow States and Victims to Fight Online Sex Trafcking Act of 2017 (also known as FOSTA) was passed into law. FOSTA amended Section 230 of the Communications Decency Act so that online platforms could be held liable for unlawfully promoting and facilitating “prostitution and that facilitate trafckers in advertising the sale of unlawful sex acts with sex trafcking victims.”13 Although intended to address real harms, the law was not well-crafted to address the harms of sex trafcking, and instead it has undermined one of the foundational frameworks that created the internet as we know it. It opened up new discussions on whether further exemptions to intermediary liability protections should be proposed. In addition,

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 9 intelligence-moderate-user-generated-content/ FOSTA was criticized for silencing user discussions on controversial topics such as sex work, as well as for making the lives of such sex workers more dangerous, as they were forced of of online platforms and back onto the streets to solicit clients.14

Most recently, conservative politicians in the United States have begun claiming that major internet platforms are demonstrating political bias against conservatives in their content moderation practices. As a result, in June 2019 Senator Josh Hawley (R-Mo.) introduced the “Ending Support for Internet Censorship Act,” which aims to amend Section 230 so that larger internet platforms may only receive liability protections if they are able to demonstrate to the Federal Trade Commission that they are “politically neutral” platforms. The Act raises First Amendment concerns, as it tasks the government to compel and regulate what platforms can and cannot remove from their websites and requires platforms to meet a broad, undefned defnition of “politically neutral.”16

On an international level, there are two primary documents that provide protections for freedom of expression. The frst is Article 19 of the Universal Declaration of Human Rights (UDHR), and the second is Article 19 of the International Covenant on Civil and Political Rights (ICCPR). Both of these documents recognize that free speech and free expression are fundamental human rights, and both prohibit eforts to unjustly clamp down on them. However, freedom of expression is not an absolute right under human rights law and can be subject to necessary and proportionate limitations.17

Up until now, internet platforms in the U.S. have engaged in voluntary content moderation and self-regulation. However, a wave of terror attacks facilitated through online platforms and foreign interference in the 2016 U.S. presidential election have sparked concerns about the use of these platforms to spread terror propaganda and political disinformation.18 As a result, platforms have come under increased pressure to identify and moderate these forms of objectionable content.

This pressure has manifested into legislation around the world. In 2016, Germany introduced the Netzwerkdurchsetzungsgesetz—also known as the Network Enforcement Act or the NetzDG—which requires platforms to delete hate speech, terror propaganda, and other designated forms of illegal and objectionable content within 24 hours of it being fagged to the platform—or risk substantial fnes.19

In addition, in April 2019, the European Commission approved a proposal for similar regulation that would require internet platforms to remove terrorism- related content that had been fagged to them within an hour or face fnes amounting to billions of dollars.20 There has been a string of similar legislative proposals and laws emerging in countries around the world, including India, Singapore, and Kenya. These laws aim at tackling particular categories of

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 10 intelligence-moderate-user-generated-content/ objectionable content, such as hate speech or fake news, and attempt to impose criminal penalties on individuals or platforms for posting and sharing such content. Most recently, in April 2019, the government of the United Kingdom released a white paper focused on combating online harms, which proposes multiple requirements for internet companies to ensure they keep their platforms safe and can be held responsible for the content on their platforms, as well as the decisions of the company. The white paper proposes a framework to be enforced by a new regulatory body, under which companies and executives who breach the proposed “statutory duty of care” could be charged with hefty fnes.21 Many of these forms of regulation place undue pressure on companies to remove content quickly or face liability, thereby creating strong incentives for them to err on the side of broad censorship. Mandating that companies remove content along arbitrary timeframes is particularly concerning because it exacerbates this pressure. In order to comply, companies have invested more in automated tools to fag and remove such objectionable content. However, the mandatory timelines set forth by many of these regulations establish a content moderation environment that prioritizes speed over accuracy. In response, many companies are rapidly developing and implementing automated tools that take down a wide range of content quickly, often with little transparency to the public. This has resulted in overbroad content takedowns and increased threats to user expression online.

Many of these forms of regulation place undue pressure on companies to remove content quickly or face liability, thereby creating strong incentives for them to err on the side of broad censorship.

For example, shortly after the NetzDG came into efect in Germany, two senior members of the far-right Alternative for Germany (AfD) party who had tweeted anti-Muslim and anti-immigrant content had their tweets fagged and removed for containing hate speech. However, a series of tweets from the satirical magazine Titanic, which caricatured the initial tweets and which were not hate speech in themselves, were also removed,22 demonstrating how such regulation pushes companies to engage in overbroad takedowns of content in order to avoid fnes. This case, which was one of the frst to occur after the NetzDG was introduced, also demonstrated how automated tools lack a nuanced and contextualized understanding of human speech, as they were unable to distinguish between hate speech and satire.

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 11 intelligence-moderate-user-generated-content/ How Automated Tools are Used in the Content Moderation Process

Automated tools are used to curate, organize, flter, and classify the information we see online. They are therefore pivotal in shaping not only the content we engage with, but also the experience each individual user has on a given platform. There are a host of automated tools, many fuelled by artifcial intelligence and machine learning, that can be deployed during the content moderation process. These tools can be deployed across a range of categories of content and media formats, as well as at diferent stages of the content lifecycle, to identify, sort, and remove content. This section aims to provide an overview of some of the most widely used automated tools and methods in this feld, as well as their strengths and limitations.

Digital Hash Technology:

Digital hash technology works by converting images and videos from an existing database into a grayscale format. It then overlays them onto a grid and assigns each square a numerical value. The designation of a numerical value converts the square into a hash, or digital signature, which remains tied to the image or video and can be used to identify other iterations of the content either during ex-ante moderation or ex-post proactive moderation.23

Digital hash technology has thus far been widely adopted by internet platforms to identify CSAM and copyright-infringing material. The CSAM detection technology, known as PhotoDNA, was originally developed by Microsoft and has expanded to become a powerful tool used by companies such as Twitter, Google, Facebook, law enforcement, and organizations such as the National Center for Missing & Exploited Children. PhotoDNA generates digital hashes from a database of thousands of existing illegal CSAM images and can detect hashes across a broad spectrum in microseconds. In response to growing concerns around copyright-infringement from its users, YouTube adapted PhotoDNA technology to create the ContentID technology. ContentID enables YouTube users to create digital hashes for their video content to help protect against copyright violations. Once these hashes have been created, all content that is subsequently uploaded to the YouTube platform is screened against its database of audio and video fles in order to identify potential copyright violations.24 Both of these tools are particularly resilient against manipulation, including resizing, color alterations, and watermarking.25

The databases of signatures that these algorithms are trained on are continuously updated. There are over 720,000 known instances of CSAM,26 and once new images or videos are identifed, they are added to the database. Similarly, when copyright holders fag infringing content to YouTube, this content is added to the

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 12 intelligence-moderate-user-generated-content/ ContentID database so that future screenings of content will incorporate these materials.27 The expansion of these databases, as well as the continuous evaluation of and updates to these software programs, aim to improve the efectiveness of these tools. This is one particular area where machine learning is being deployed to utilize past learnings in order to inform future predictions and behaviors.28 Further, although it is much harder now to circumvent image hashes, it is still possible to circumvent audio and video hashing by, for example, altering the length or encoding format of the fle, as this would require a new hash of the fle to be generated.29 Most recently, PhotoDNA technology has been adapted in order to detect and remove extremist content and terror propaganda-related images, video, and audio online. Similar to PhotoDNA, this tool, known as eGLYPH, is capable of detecting and removing content on a platform that has a corresponding hash, and is also able to prevent the upload of such content ex- ante.30 However, the application of this automated technology to content moderation decision-making around extremist content has raised a signifcant number of concerns, as the defnition of what is extremist content, and what should therefore be included in hash databases, is vague and largely platform- dependent. In addition, most platforms focus their content moderation eforts on certain extremist groups, such as the Islamic State and al-Qaeda. As a result, these automated tools demonstrate a bias in terms of which groups they are trained to focus on, and demonstrate less reliability when addressing the larger corpus of extremist groups and movements that may use online services. Furthermore, moderating extremist content often requires a nuanced understanding of varied regions and cultures and an appreciation for the context in which an image is posted, something automated tools do not have. For example, while platforms will want to take down terrorist propaganda that glorifes acts of gruesome violence, it is important to permit journalists and human rights organizations to raise awareness about terrorist atrocities. As a result, automated moderation of this content has resulted in overbroad takedowns and infringements on user expression.

There is also a signifcant lack of transparency and accountability around how digital hash technology is being deployed to identify and moderate extremist content. For example, in June 2017, Facebook, Microsoft, Twitter, and YouTube formed the Global to Counter Terrorism (GIFCT) in order to curb the spread of extremist content online. One of the main eforts of the GIFCT was the creation of a shared industry hash database that contains over 40,000 image and video hashes that can aid company eforts to moderate extremist content. However, despite the fact that the database has been used by companies for over two years, there has been little transparency around what specifc groups this database focuses on, how content added to the database is vetted and verifed as extremist content, and how much content and how many accounts have been correctly and erroneously removed across participating platforms as a result of the database.31 As a result, it is difcult to assess the efectiveness and accuracy of such tools.

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 13 intelligence-moderate-user-generated-content/ Image Recognition:

Digital hash technologies utilize image recognition. However, image recognition can also be used more broadly during the content moderation process. For example, during ex-post proactive moderation, image recognition tools can identify specifc objects within an image, such as a weapon, and decide based on factors including user experience and risk whether the image should be fagged to a human for review. Automated image recognition tools are currently employed by several internet platforms, as they help flter through and prioritize cases for human moderators, thus saving time.32 Although the algorithms that power image recognition tools are continuously reinforced when information regarding the ultimate decision a human moderator made is fed back into it, this feedback loop does not provide detailed information on why the moderator made this decision. As a result, these algorithms are unable to develop into more dynamic tools that could incorporate nuanced and contextual insights into their detection procedures, such as whether content with a weapon in it is actually violent, or—for example—satirical in nature.33 In addition, the accuracy of these models depends on the quality of the datasets they are trained on. If these models are trained on datasets that focus on specifc types of weapons, they would refect this bias and would not be able to accurately identify all potential instances of violent content containing weapons on a platform. There is also a lack of transparency around how these image recognition databases are compiled, what types of content they focus on, how efective and accurate they are across diferent categories of content, and how much user expression has been accurately and erroneously removed as a result of these tools.

Metadata Filtering:

Most digital fles contain information that provides descriptive characteristics about their content. This is known as a fle’s metadata. For example, an audio fle that is a song could be labeled with information such as the song’s title or the length. Metadata fltering tools can be used during ex-ante and ex-post proactive moderation to search a series of fles in order to identify content that fts a particular set of metadata parameters. Metadata fltering tools are particularly used to identify copyright-infringing materials. However, because a fle’s metadata label can be easily manipulated or mislabeled, the efectiveness and accuracy of metadata fltering tools is limited, and these tools can be easily gamed.34

Natural Language Processing (NLP):

NLP is a set of techniques that use computers to parse text. In the context of content detection and moderation, text is typically parsed in order to make predictions about the meaning of the text, such as what sentiments it indicates.35 Currently, a wide-range of NLP tools can be purchased of the shelf and are

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 14 intelligence-moderate-user-generated-content/ applicable in a range of use cases, including spam detection, content fltering, and translation services. In the context of content moderation, NLP classifers are particularly being used to detect hate speech and extremist content and to perform sentiment analysis on content.36 As outlined by researchers from the Center for Democracy & Technology, NLP classifers are generally trained on text examples, known as documents, that have been annotated by humans in order to indicate whether they belong to a particular category or not (e.g. extremist content vs. not extremist content). When a model is provided a collection of documents, known as a corpus, it works to identify patterns and features associated with each annotated category. These corpora are pre- processed so that they numerically represent particular characteristics in the text, such as the absence of a specifc word. The annotated and pre-processed text documents are used to train machine learning models to classify new documents, and the classifer is tested on a separate sample of the training data in order to determine how much the model’s classifcations matched those of the human coders.37

Although internet platforms are increasingly exploring and adopting the use of NLP classifers, these technologies are limited for a number of reasons. First, NLP technologies are domain-specifc, which means that they can only focus on one particular type of objectionable content. In addition, because there is signifcant variation in how speech is expressed, these categories are very narrow. For example, to maximize accuracy, these models are trained to detect and fag one specifc type of hate speech.38 This means that a classifer trained to detect hate speech could only be trained to focus on a particular sub-domain of hate speech, such as anti-Semitic speech. If this classifer was trained on datasets that also included some examples of other forms of hate speech, it would still only be able to be applied with relative accuracy to anti-Semitic speech. In addition, fnding and compiling comprehensive enough datasets to train NLP classifers is a challenging, expensive, and tedious process. As a result, many researchers have resorted to fltering through content using search terms or hashtags that focus on subtypes of a particular domain of speech, such as hate speech directed at a certain religious group. However, this creates and operationalizes dataset and creator bias, which can disproportionately emphasize certain types of hate speech. This re-emphasizes that such tools cannot be widely applied to multiple forms of hate speech.39

Furthermore, in order for NLP classifers to operate accurately, they need to be provided with clear and consistent parameters and defnitions of speech. Depending on the type of speech, this can be challenging. For example, defnitions around extremist content and disinformation are vague, and they are often unable to capture the full breadth, context, and nuances of such activity. On the other hand, tools that are developed based on defnitions that are overly narrow may fail to detect some speech and may be easier to bypass.40

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 15 intelligence-moderate-user-generated-content/ On the other hand, tools that are developed based on defnitions that are overly narrow may fail to detect some speech and may be easier to bypass.

In addition, NLP classifers are limited in that they are unable to comprehend the nuances and contextual elements of human speech. For example, this could include whether a word is being used in a literal or satirical context, or whether a derogatory term is being used in slang form. This therefore decreases the accuracy or these classifers, particularly when they are applied across platforms, content formats, languages, and contexts.41

Finally, there is also a lack of transparency around how corpora are compiled, what manual fltering processes—such as hashtag fltering—creators undergo to create these datasets, how accurate these tools are, and how much user expression these NLP tools remove both correctly and incorrectly.

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 16 intelligence-moderate-user-generated-content/ The Limitations of Automated Tools in Content Moderation

As highlighted in the previous section, automated tools used for content moderation are limited in a number of ways. Given that these tools are increasingly being adopted by internet platforms, it is important to understand how they shape the content we engage with and see online, as well as user expression more broadly. This section provides a more detailed discussion of the primary limitations of these automated tools.

Accuracy and Reliability:

The accuracy of a given tool in detecting and removing content online is highly dependent on the type of content it is trained to tackle. Developers have been able to train and operate tools that focus on certain types of content—such as CSAM and copyright-infringing content—so that they have a low enough error rate and can be widely adopted by small and large platforms. This is because these categories of content have large corpora with which tools can be trained and clear parameters around what falls in these categories. However, in the case of content such as extremist content and hate speech, there are a range of nuanced variations in speech related to diferent groups and regions, and the context of this content can be critical in understanding whether or not it should be removed. As a result, developing comprehensive datasets for these categories of content is challenging, and developing and operationalizing a tool that can be reliably applied across diferent groups, regions, and sub-types of speech is also extremely difcult. In addition, the defnition of what types of speech fall under these categories is much less clear.42 Although smaller platforms may rely on of- the-shelf automated tools, the reliability of these tools to identify content across a range of platforms is limited. In comparison, proprietary tools developed by larger platforms are often comparatively more accurate, as they are trained on datasets refective of the types of content and speech they are meant to evaluate. 43

Although smaller platforms may rely on of-the- shelf automated tools, the reliability of these tools to identify content across a range of platforms is limited.

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 17 intelligence-moderate-user-generated-content/ Additionally, the defnition of what constitutes accuracy varies based on the objectives of a researcher and a given model. In most NLP studies, accuracy can be defned as the degree to which a model could make the same decisions as a human being. However, because human beings come with their own set of biases and opinions that infuence how they would categorize speech, this defnition of accuracy is perhaps not the most reliable metric for evaluating automated tools in the content moderation space. Other factors, such as the ratio and number of false positives and false negatives, should also be considered. However, researchers and developers should recognize that these statistics represent more than just quantitative metrics. They also represent real impacts on user expression, and should therefore be weighted accordingly. 44

Contextual Understanding of Human Speech:

In theory, automated content moderation tools should be easy to create and implement, as they are far more rule-bound than human beings. However, because human speech is not objective and the process of content moderation is inherently subjective, these tools are limited in that they are unable to comprehend the nuances and contextual variations present in human speech.45 As discussed above, these tools are limited in their ability to parse and understand variances in language and behavior that may result from diferent demographic and regional factors. For example, excessively liking someone’s pictures or using certain slang words may be construed as harassment on one platform or in one region of the world. However, these behaviors and speech may take on an entirely diferent meaning on another platform or in another community.46 In addition, automated tools are also limited in their ability to derive contextual insights from content. For example, an image recognition tool could identify an instance of nudity, such as a breast, in a piece of content. However, it is unlikely to be able to determine whether the post depicts pornography or perhaps breast feeding, which is permitted on many platforms.47 In addition, automated content moderation tools can become outdated rapidly. On Twitter, members of the LGBTQ+ community found that there was a signifcant lack of search results that incorporated hashtags such as #gay and #bisexual, raising concerns of censorship. The company stated that this was due to the deployment of an outdated algorithm that mistakenly identifed posts with these hashtags as potentially ofensive. This demonstrates the need to continuously update algorithmic tools, as well as the need for decision-making processes to incorporate context in judging whether posts with such hashtags are objectionable or not.48 These automated tools also need to be updated as language and meaning evolves. For example, in an attempt to avoid moderation, some hateful groups have adopted new methods of slang and representations for indicating hate. One example of this is white supremacists using the names of companies, such as “Google” and “Yahoo” to replace ethnic slurs. In order to keep up, automated tools would have to adapt quickly and be trained across a wide range of domains. However, users could continue developing new forms of

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 18 intelligence-moderate-user-generated-content/ speech in response, thus limiting the ability of these tools to act with signifcant speed and scale.49 On some platforms when humans moderators engage in content moderation, they are able to combat the rapidly changing nature of speech by viewing additional information on the case, such as information on the user who is accused of violating the platform’s rules. However, incorporating such assumptions and processes into an automated tool runs the risk of enhancing biases around particular groups of individuals and could result in skewed or even discriminatory enforcement of content policies.50

As of now, AI researchers have been unable to construct comprehensive enough datasets that can account for the vast fuidity and variances in human language and expression. As a result, these automated tools cannot be reliably deployed across diferent cultures and contexts, as they are unable to efectively account for the various political, cultural, economic, social, and power dynamics that shape how individuals express themselves and engage with one another.

Creator and Dataset Bias:

One of the key concerns around algorithmic decision-making across a range of industries is the presence of bias in automated tools. Decisions based on automated tools, including in the content moderation space, run the risk of further marginalizing and censoring groups that already face disproportionate prejudice and discrimination online and ofine.51 As outlined in a report by the Center for Democracy & Technology, there are many types of biases that can be amplifed through the use of these tools. NLP tools, for example, are typically used to parse text in English. Tools that have a lower accuracy when parsing non- English text can therefore result in harmful outcomes for non-English speakers, especially when applied to languages that are not very prominent on the internet, as this reduces the comprehensiveness of any corpora that models are trained on. Given that a large number of the users of major internet platforms reside outside English-speaking countries, this is highly concerning. The use of such automated tools in decision-making should therefore be limited when making globally relevant content moderation decisions.52 These tools are also unable to efectively process diferences in dialect and language use that may result from demographic diferences.53

In addition, the personal and cultural biases of researchers are likely to fnd their way into training datasets. For example, when a corpus is being created, the personal judgments of the individuals annotating each document can impact what is constituted as hate speech, as well as what specifc types of speech, demographic groups, and so on are prioritized in the training data. This bias can be mitigated to some extent by testing for intercoder reliability, but it is unlikely to combat the majority view on what falls into a particular category.54

Transparency and Accountability:

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 19 intelligence-moderate-user-generated-content/ One of the primary concerns around the deployment of automated solutions in the content moderation space is the fundamental lack of transparency that exists around algorithmic decision-making as a whole. These algorithms are often referred to as “black boxes,” because there is little insight into how they are coded, what datasets they are trained on, how they identify correlations and make decisions, and how reliable and accurate they are. Indeed, with black box machine learning systems, researchers are not able to identify how the algorithm makes the correlations it identifes.Currently, some internet platforms provide limited disclosures around the extent to which automated tools are used to detect and remove content on their platforms. In its Community Guidelines enforcement report, for example, YouTube discloses how many of the videos and comments it removed were originally detected using automated fagging tools, as well as what percentage of these videos were removed before they were viewed or after they were viewed.55 Although many companies have been pushed to provide more transparency around their own proprietary automated tools, they have refrained from doing so, claiming that the tools are protected as trade secrets in order to maintain their competitive edge in the market—and also to prevent bad actors from learning enough to game their systems.56 In addition, some researchers have suggested that, in this regard, transparency does not necessarily generate accountability. In the broader content moderation space, it is gradually becoming a best practice for technology companies to issue transparency reports that highlight the scope and volume of content moderation requests they received, as well as the amount of content they proactively removed as a result of their own eforts. In this case, transparency around these practices can generate accountability around how these platforms are managing user expression.

However, in the case of algorithmic decision-making, researchers such as Maayan Perel and Niva Elkin-Koren have suggested that looking “under the hood” of black boxes would yield a large volume of incomprehensible data that is a combination of inputs and outputs and that would require signifcant data analysis in order to extract insights. Although processing this data is not impossible, it would not generally provide any transparency around how the actual decision-making occurred, as well as how a company is ensuring tools are being used fairly. In addition, unlike humans, algorithms lack “critical refection.”57 As a result, other ways for companies to provide transparency in a manner that generates accountability are also being explored.58 One example of such a mechanism is providing greater transparency into the training data, as this can help researchers understand decisions being made by black-box algorithmic models to a certain extent.

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 20 intelligence-moderate-user-generated-content/ Unlike humans, algorithms lack “critical refection.”

Two mechanisms for providing accountability around content takedown decisions that are gradually being adopted are notice and appeals. Internet platforms have begun providing notices to users who have had their content removed or accounts suspended or deleted for violating content guidelines. In addition, some platforms have introduced appeals processes so that users can seek review of content or account-related decisions. However, these mechanisms have not yet been perfected. Although users may receive notifcations that their content has been removed or their account has been suspended or deleted, these notices often lack meaningful explanations on which specifc content guidelines the user violated. In addition, on some platforms, appeals processes do not enable users to provide more context or an explanation around the content or account in question, and appeals are often not available for all categories of content that are removed. Furthermore, the appeals process can often be a lengthy procedure that leaves a user without access to their account for a signifcant period of time. Although these mechanisms for generating accountability around content takedown practices are not perfect, they are gradually being adopted by a range of internet platforms.59

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 21 intelligence-moderate-user-generated-content/ Case Study: Facebook

Out of the three platforms covered by case studies in this report, Facebook has by far the largest content moderation operation. In addition, it is the platform that has come under the most scrutiny for its content moderation decision-making practices, both human and automated. One of the reasons for this is that Facebook is one of the largest social media platforms in the world. It ranks third in global internet engagement after YouTube and Google.com,60 and the platform has over 2.38 billion monthly active users worldwide.62 As a result, Facebook’s content moderation practices afect a signifcant amount of user expression across the globe. Facebook utilizes both a centralized and hybrid approach to content moderation. In order to provide consistency in how its rules are applied across the world, the company has a global set of Community Standards. These Community Standards are enforced by Facebook’s enormous global pool of human content moderators, who are part of the 30,000 people who work on safety and security for the platform.63 In an attempt to ensure that its policies can be localized and enforced appropriately in diferent regions and across diferent contexts, Facebook tries to hire moderators based on their language or regional expertise. All reviewers receive the same general training on the company’s Community Standards and how to enforce them. Some of these moderators later develop specialties with certain sensitive content areas such as self-harm.64 Facebook has stated that its rules are structured to reduce bias and subjectivity so that reviewers can make consistent judgements on each case.65 In response to growing global pressure from governments and the public to take down violating content quickly, Facebook has invested heavily in automated tools for content moderation. These include image recognition and matching tools to identify and remove objectionable content such as terror-related content; NLP and language matching tools that seek to recognize and learn from patterns in text related to topics such as propaganda and harm; and pattern identifcation tools, which seek to identify patterns of similar objectionable content on multiple Facebook pages or patterns among individuals who post similar types of objectionable content. The platform has found that pattern detection is most efective for images, such as resized terror propaganda images, rather than text, as text can be more easily manipulated in order to evade detection and removal— and because text requires greater contextual understanding to evaluate.66

As part of its hybrid approach to content moderation, Facebook engages in several phases of algorithmic and human review in order to identify, assess, and take action against content that potentially violates its Community Standards. Automated tools are typically the frst layer of review when identifying violating content on the platform. Depending on the level of complexity and the degree of additional judgment needed, the content may then be relayed to human moderators.67

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 22 intelligence-moderate-user-generated-content/ Facebook deploys automated tools during the ex-ante stage of content moderation. When a user submits content to Facebook, such as a photograph or video, it is immediately screened in an automated process. As described in the section on how automated tools are used in the content moderation process above, this algorithmic screening uses digital hashes to proactively identify and block content that matches existing hash databases for content such as CSAM and terrorism-related imagery.68 Facebook also uses proactive match and action tools to detect and remove content that matches some previously identifed spam violations. However, it does not screen new posts against every single previously identifed spam violation, as this would result in a signifcant delay between when the user posts the content and when it appears on the website. Rather, this proactive screening process focuses on identifying CSAM and terrorism-related imagery.69 Once content has been posted to Facebook, the company engages in ex-post proactive moderation as it employs a diferent set of algorithms to screen and identify objectionable content. These algorithms assess content to identify similarities to specifc patterns found in, for example, images, words, and behaviors that are commonly associated with diferent types of objectionable content. In its latest report, the Facebook Data Transparency Advisory Group (DTAG), an independent advisory board chartered by Facebook and composed of seven experts from various disciplines, has stated that this process is challenging and limited in that additional context is often required in order to evaluate whether the presence of a certain indicator, such as a specifc word, is being used in a violating manner. The algorithms involved in this process also consider other factors related to the post, such as the identity of the poster, the content of the comments, likes, and shares, as well as what is depicted in the rest of an image or video if the content is visual in nature. These elements add context and are used to calculate the likelihood that a piece of content violates the platform’s Community Standards. According to the DTAG report, this list of classifers is continuously updated and algorithms are retrained to incorporate insights that are acquired as more violating content is identifed or is missed. The DTAG report also asserts that if these algorithms determine that the content in question clearly violates a Community Standard, it may remove it automatically without relaying it to a human moderator. However, the report notes that, in cases where the algorithm is uncertain on whether a piece of content violates the platform’s rules, the content is sent to a human moderator for review.70 The report does not clarify the circumstances in which the company believes an algorithm can make such a defnitive determination.Automated tools are also used to triage and prioritize content that is fagged by users during the ex-post reactive portion of content moderation. When a user fags content on the platform, it goes through an automated system that decides how the content should be reviewed. According to the DTAG report, if the system identifes that the content violates the Community Standards, it may be automatically removed. However, as with ex-post proactive moderation, if the algorithm is unsure, the content will be routed to a human moderator.71 If a user fags content before the

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 23 intelligence-moderate-user-generated-content/ company is able to identify it, this fag also informs the platform’s machine learning models.72 In order to audit the accuracy of automated decision-making in content moderation, Facebook calculates two primary metrics: precision and recall. Precision measures the percentage of posts that were correctly labeled as violations out of all the posts that were labeled as violations. Recall measures the percentage of posts that were correctly labeled as violations out of all the posts that were actually violations. Facebook calculates these two metrics separately for each classifer in each algorithm.73 However, the DTAG reported that they were unable to acquire details on topics such as specifc classifers, the accuracy of Facebook’s enforcement system, and error and reversal rates, thus limiting the amount of insight the group had into the platform’s algorithmic decision-making processes for content moderation.

Facebook does provide some degree of transparency around how it uses automated tools to proactively identify and remove content and how much user speech this has impacted, in its Community Standards Enforcement Report (CSER). In the CSER, Facebook reports on how much of the objectionable content they removed was identifed proactively using their automated tools (“proactivity rate”) in comparison to objectionable content that users reported to Facebook frst. Facebook provides this data for nine content categories, including adult nudity and sexual activity, hate speech, terrorist propaganda (focused on ISIS, al-Qaeda, and afliated groups), and violence and graphic content. Facebook does not provide this data, however, for all of the categories of content that the company has deemed impermissible, and that Facebook moderates, under its Community Standards. These categories include suicide and self-injury. 74 The proactivity rate that Facebook discloses in its CSER can change due to a number of factors, including the fact that Facebook is continuously updating and refning its algorithmic models, as well as the fact that the degree to which content is deemed “likely” to violate the Community Standards varies over time. 75 Although the platform has invested signifcantly in artifcial intelligence and machine learning, its algorithmic decision-making capabilities in terms of content moderation are still limited. For example, despite the fact that Facebook has technology that can detect images, audio, and text that potentially violate the company’s Community Standards on the livestream feature, the Christchurch terrorist was still able to livestream his attack in New Zealand.76 In addition, the company has faced signifcant criticism when its automated tools have resulted in the erroneous takedown of user expression. This has included content posted by human rights activists seeking to document atrocities in Syria, which was mislabeled and removed for violating Facebook’s policies on graphic violence.77

Facebook’s centralized and hybrid approach to content moderation enables the company to deploy a range of tools to moderate content at scale and around the world. However, as demonstrated, the efectiveness of automated tools when identifying and moderating content is limited. As a result, although the platform is investing heavily in new artifcial intelligence-driven content moderation tools,

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 24 intelligence-moderate-user-generated-content/ it is vital that the company continues adopting a hybrid model of content moderation so that a human moderator is always in the loop to ensure decisions are fair and context-specifc. In addition, given that the company is a gatekeeper of a signifcant amount of user expression, it needs to provide greater transparency and accountability around how it deploys automated tools for content moderation practices and how much user expression this impacts. Although the platform already issues a CSER, it discloses limited information around the role and impact of automated tools in enforcing community standards. This information should be expanded, and the platform should disclose information about how its algorithms are created, trained, tested, and improved.

It is vital that the company continues adopting a hybrid model of content moderation so that a human moderator is always in the loop to ensure decisions are fair and context-specifc.

Currently, Facebook provides relatively detailed notices to users when their content is removed, ofers an appeals process to users who have had certain categories of content removed, and reports on the number of content actions that were appealed and the amount of content restored as a result of appeals in its CSER. However, these processes can be improved in order to provide transparency and accountability around their use of automated tools.78 For example, in its notice to users, Facebook should specify whether the content removed was fagged and detected by an automated tool, an entity such as an Internet Referral Unit, or a user. In addition, the platform should enable users to provide more context and information during the appeals process, particularly in cases where the content was erroneously fagged or removed by an automated tool.79 The platform should also work to expand its appeals process to cover the range of objectionable content prohibited by its Community Standards.

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 25 intelligence-moderate-user-generated-content/ Case Study: Reddit

Reddit is a social news aggregation and discussion website based in the United States. The platform has approximately 330 million monthly active users,80 is ranked 16th for global internet engagement,81 and has been labeled “the biggest little site no one’s ever heard of.”82 The platform enables users, who operate under pseudonyms, to create subpages, called subreddits, on specifc interests or topics. In this way, the platform has become popular among particular interest or activity-focused communities, such as gamers and sports fans. The home page of Reddit, as well as each individual subreddit, uses a user-driven voting system that determines the ranking of content posted on each given page.83

Reddit utilizes a primarily decentralized and hybrid approach to content moderation. The company has a set of overarching content policies regarding acceptable content that are high-level and prohibit illegal content such as CSAM, as well as objectionable behaviors such as harassment and content that encourages or incites violence.84 In order to broadly enforce these content policies, Reddit has a small, centralized team of moderators (known to users as administrators or admins), who comprise approximately 10 percent of Reddit’s 400 person workforce.85 However, the majority of content moderation on the platform is carried out by the moderators of individual subreddits, who are known as mods. Mods are users who volunteer to moderate content on a particular subreddit. They have signifcant editorial discretion and can choose to remove content that violates Reddit’s rules or that they deem objectionable or of-topic. They can also temporarily mute or ban users from their subreddit. Mods are also empowered to create additional content policies that defne acceptable content and use for their subreddits as long as they do not confict with Reddit’s global set of content policies. All of the mods on a subreddit can also collectively create guidelines that outline their own responsibilities and codes of conduct. Mods may also have additional roles, such as fostering discussions, depending on the subreddit.86 Admins rarely intervene in content moderation decisions unless it is to remove objectionable content that is illegal or clearly prohibited by Reddit’s content policies,87 or to ban users from the site as a whole.88 According to researchers from Microsoft, there are approximately 91,563 unique mods on the platform, with an average of fve mods per subreddit.89

By employing a decentralized approach to content moderation, Reddit is able to save time and resources by relying on its users to aid with content moderation. This approach keeps users engaged and serves the overall business aims of the company. In addition, it positions the company as a promoter of diverse viewpoints, since each individual subreddit has its own content policies that are tailored to the needs of each specifc subreddit community.90 This decentralized approach to content moderation has also resulted in users self-policing to ensure they do not violate specifc content policies and has fostered an environment in

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 26 intelligence-moderate-user-generated-content/ which users call one another out for violating policies or posting objectionable content.91 Further, this decentralized model enables localized and context- specifc moderation decisions as mods set and enforce content guidelines that are appropriate to the particular nuances, norms, and variations attributed to diferent discussion topics.

By employing a decentralized approach to content moderation, Reddit is able to save time and resources by relying on its users to aid with content moderation.

In addition to employing a small number of human moderators to engage in ex- post reactive content moderation, Reddit admins also employ some automated tools in order to identify and remove objectionable content such as CSAM. However, because the majority of content moderation is carried out by users, Reddit has also developed an automated tool, known as the AutoModerator, that mods can use to moderate content on their subreddits at scale. The AutoModerator is a built-in, customizable bot that provides basic algorithmic tools to proactively identify, flter, and remove objectionable content during the ex-ante moderation stage. The bot operates based on mod-chosen parameters such as keywords, content that has a high number of reports, website links, and specifc users, etc. that are not permitted in a particular subreddit. The AutoModerator can automatically remove this objectionable content, but mods also have the opportunity to review this removed content later and can reverse any erroneous removals. In addition to using the AutoModerator, many Reddit mods have turned to creating their own bots or tools, or using free versions available online, in order to fag custom words and enhance their moderation practices.92 The decentralized approach to content moderation empowers users to manage their own speech and helps democratize expression and enable localized and diverse viewpoints, as well as context-specifc content moderation practices. However, it does raise a number of questions regarding accuracy and reliability, bias, and transparency and accountability. There is little insight into how accurate the AutoModerator is across diferent subreddits and categories of content or violations. In addition, because mods create the content policies for subreddits and defne the parameters that the AutoModerator operates on, the deployment of automated tools for content moderation will undoubtedly refect the personal biases of the mods. There is little transparency around this process,

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 27 intelligence-moderate-user-generated-content/ and because Reddit operates in a decentralized manner, there is a lack of a clear accountability mechanism.

In its transparency report, Reddit discloses the amount of content removed by mods—including through the use of the AutoModerator—and the amount of content removed by admins.93 However, this is the only metric that touches on the scope and volume of content moderation carried out by mods. The remainder of the metrics covered in the report, such as the number of potential content policy violations received, what percent of these reports were actionable, what content policies these actionable reports covered, and how many appeal requests were received and granted, are all admin-focused. Therefore, although the majority of content takedowns on Reddit involved removals by mods, most metrics in the report do not cover mod activities.94

The decentralized nature of Reddit’s content moderation approach therefore prevents further transparency around the activities of mods, who are responsible for moderating the most content, and how they are deploying algorithmic tools to manage and moderate user expression. Going forward, Reddit should consider requiring mods to monitor and track the amount of content they remove, both manually and using the AutoModerator, so the platform can disclose this information in its transparency report. In addition, Reddit ofers notice to users who have had their content removed or accounts suspended. It also ofers an appeals process to users who feel their content or accounts have been erroneously impacted by content moderation activities. However, it is unclear whether mods ofer notices or a similar appeals processes to users who have been impacted by mods’ moderation processes. The AutoModerator, does, in fact, provide notice to users when it removes content from subreddits, suggesting that, when automated tools are deployed by mods to moderate content, users are notifed of the resulting impact.95

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 28 intelligence-moderate-user-generated-content/ Case Study: Tumblr

Tumblr is a microblogging and social media website currently owned by Verizon Media. The platform ranks 78th for global internet engagement,96 and as of April 2017, it had 738 million unique visitors worldwide,97 with 2019 statistics citing that the number of blog accounts on the platform had grown to 463.5 million.98 The company utilizes a centralized, hybrid approach to content moderation, although it is unclear how many human moderators the company employs and to what extent the platform uses automated tools to moderate content. Although Tumblr is not one of the largest or most widely-used internet platforms, it is an interesting one to consider when assessing how algorithmic decision-making is deployed for content moderation purposes because the platform recently amended its Community Guidelines to ban adult content and nudity. Before this policy change, the platform was considered a haven for graphic forms of expression on the internet. However, as of December 2018, Tumblr’s rules were updated to state that pornography and adult content were no longer permitted on the platform.99 In its announcement to users, Tumblr outlined that any content that violated this new policy would be fagged using a “mix of machine-learning classifcation and human moderation.” The use of automated tools to fag potentially violating content in this case makes sense, as the platform had millions of posts and featuring adult content, and the scope and scale of removing such content could not be achieved by human moderators alone.100 However, identifying and removing such content also requires context. For example, although an algorithm could be trained to identify all images with female breasts, they are unlikely to be able to distinguish whether these images are graphic in nature or whether they are discussing or depicting mastectomies, gender confrmation surgeries, or breastfeeding. As a result, it is vital that a human moderator always remains in the loop during the content moderation process.

Prior to this announcement, Tumblr only fltered out adult content through its “Safe Mode” feature that allows users to select which content they will see. Shortly after Tumblr was acquired by Verizon Media in June 2017, it introduced this opt-in feature in order to let users flter “sensitive” content from their own dashboard and search results. However, the feature had faws. Users quickly found that it fltered out non-adult content, including LGBTQ+ posts. It is unclear whether the company is deploying the same artifcial intelligence technology used for Safe Mode to implement this new platform-wide ban on adult content, but WIRED reported that the company would be using modifed proprietary technology. The company also announced it would be hiring more human moderators.101

Tumblr’s use of algorithmic decision-making to institute this new platform-wide ban on adult content created a range of issues. Following the introduction of the

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 29 intelligence-moderate-user-generated-content/ policy, users took to Twitter to document the numerous cases of erroneous takedowns of content which included the removal of everything from chocolate ghosts, to Joe Biden,102 to a cartoon scorpion with the hashtag #TooSexyForTumblr.103

One of the reasons for these faws could be that Tumblr’s defnition of adult content spans a range of content formats, including photos, videos, and GIFs which depict human genitals or female-presenting nipples. They also include photos, videos, GIFS, and illustrations that depict sex acts. In order to efectively identify and remove this content, Tumblr’s algorithms therefore need to be able to make complex determinations to prevent overbroad takedowns of user content. These include ensuring that any representation of human genitals or sexual body parts are real-life imagery and are not part of artwork such as paintings or sculptures. In addition, these include understanding the context behind certain forms of nudity, such as the previously discussed examples regarding depictions of breasts. This is not a problem unique to Tumblr. In some cases, more training data can help automated tools understand context and nuances to some extent. For example, if you are trying to teach a model the diference between non-sexual depictions of breasts, such as breastfeeding, and graphic depictions of breasts, you could provide it with more data to learn from. As a result, the model would likely logically decipher that most images of breastfeeding contain infants or children. However, based on this assumption, a user posting graphic depictions of breasts could avoid detection by featuring an infant anywhere in their image. In comparison to other categories of objectionable content, such as extremism and disinformation, however, adult content has clearer defnitions and is easier to moderate and train models on, as adult content has stronger and more consistent visual elements.104

In addition, the wide range of content that was erroneously removed by Tumblr after the ban serves as an example of the impact of dataset and creator bias, as well as the concerning lack of accuracy and reliability in the platform’s models.105 For example, WIRED researchers ran several of the Tumblr posts that were erroneously removed through Matroid’s NSFW natural imagery classifer. The classifer correctly identifed each one as not adult content (although it did indicate there was a 21 percent chance the chocolate ghosts could be adult content). This demonstrates a weakness in Tumblr’s own classifers and raises questions around how the platform is training its models and with what datasets. 106 Machine learning models need to be trained on a vast amount of data in order to operate efectively and improve. Most of the models used in content moderation are supervised machine learning models, meaning that the content has been annotated to indicate whether or not it falls into a particular category. Platforms that have prohibited, and as a result moderated, nudity and adult content for a long period of time have a vast amount of annotated data in this domain that they can use to train their models on.

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 30 intelligence-moderate-user-generated-content/ However, because Tumblr did not previously prohibit nudity and adult content on its platform, it likely did not have the same robust datasets that competing platforms have.107 Researchers such as Tarleton Gillespie have speculated that the company began reviewing and annotating images before introducing the amendment to their Community Guidelines in order to obtain the training data they needed. However, the backlash from the rollout has indicated that this data was not sufcient.108 Although users are ofered the opportunity to appeal adult content takedown decisions on Tumblr to a human moderator, this demonstrates a clear limitation and weakness in its adoption and implementation of automated tools for content moderation. Going forward, the company will need to develop more comprehensive datasets to train its classifers on to ensure moderation of user speech in this category is more accurate and reliable.109

In addition, in order to provide greater accountability around its content takedown decisions, the platform should expand its appeals process so that users have the power to contest takedown decisions as a whole. Tumblr does not clearly disclose the types of takedowns for which appeals are available. Furthermore, Tumblr needs to begin providing adequate notice to users if their content or accounts are removed or suspended. These notices should ofer meaningful explanations to users as to why their content or accounts were impacted. This should, at a minimum, include a URL, content excerpt, or other summary that enables the user to understand what specifc content violated Tumblr’s Community Guidelines, which of Tumblr’s Community Guidelines the user violated, how the content was detected and removed, and how the user can appeal this decision.

Further, with the use of black box algorithms, even the developers may not know how automated tools are making decisions.

In addition, Tumblr’s content moderation system may have identifed patterns between objects in images that its developers did not teach it, and as a result could be removing content erroneously. This demonstrates how algorithms can exacerbate hidden biases in the training data. Further, with the use of black box algorithms, even the developers may not know how automated tools are making decisions.110 Greater transparency around the data that models are trained on could help provide further insight into these processes. Without greater transparency into how these automated content moderation processes are

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 31 intelligence-moderate-user-generated-content/ impacting user speech, it is impossible to properly understand how accurate these algorithmic decision-making tools are, how much user speech they are impacting, and what percentage of the overall content being removed is accurately being removed. As with most public rollouts, the negative aspects and mistakes take center-stage and the positive attributes are left out. Greater transparency from Tumblr on this could shed further light on the successes and challenges associated with introducing new automated tools, and with leveraging existing automated tools to support the enforcement of new policies.

One way Tumblr could do this is by disclosing more data on their content takedowns in its transparency report. Currently, the transparency report only discloses data around copyright- and trademark-related content removals.111 A more comprehensive report should cover the total number of posts and accounts fagged and removed. In addition, it should include a breakdown of the posts and accounts fagged and removed organized by which of Tumblr’s Community Guidelines were violated, the format of the content at issue, and how they were fagged. These recommendations follow the Santa Clara Principles on Transparency and Accountability in Content Moderation, which outline minimum standards tech platforms must meet in order to provide adequate transparency and accountability around their eforts to take down user-generated content or suspend accounts that violate their rules. A more comprehensive transparency report should also highlight the number of appeals the platform received and how much content was restored on the platform based on the platform’s proactive eforts as well as user appeals.

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 32 intelligence-moderate-user-generated-content/ Promoting Fairness, Accountability, and Transparency Around Automated Content Moderation Practices

As outlined in this report, internet platforms of all sizes have developed and adopted automated tools to aid their content moderation eforts. However, these tools have demonstrated a range of weaknesses and are often created and operated in a nontransparent manner. In part, this lack of transparency is due to the black box nature of such tools, which prevents comprehensive insight into algorithmic decision-making processes, even by their creators. In addition, internet platforms have not been sufciently transparent about how these tools are created, trained, applied, and refned. This poses signifcant threats to user expression. Going forward, developers, policymakers, and researchers should consider the following set of recommendations in order to promote greater fairness, accountability, and transparency around algorithmic decision-making in this space.

1. Policymakers need to educate themselves on the limitations of automated content moderation tools. Lawmakers around the world are pressing and sometimes mandating that companies be more proactive in their approach to removing harmful content. This encourages these platforms to prioritize speed over accuracy, and therefore to deploy inaccurate and nontransparent automated tools to meet these expectations and requirements. This also encourages companies to err on the side of removing online speech in order to avoid liability, which therefore poses a serious threat to users’ free expression rights— particularly users from marginalized and vulnerable groups. Policymakers should recognize the limitations of automated tools for content moderation purposes and should encourage companies to establish responsible safeguards and practices around how they deploy automated tools for content moderation, rather than placing pressure on platforms to rapidly remove content in a manner that generates negative consequences.

2. Companies need to take a more proactive role in promoting fairness, accountability, and transparency around algorithmic decision-making related to content moderation. This is a vital process that can and should take many forms:

• Companies should disclose more information to policymakers, researchers, and their users around their algorithmic models. This should include, but not be limited to, what kinds of information datasets contain (e.g. how regionally, linguistically, and demographically diverse the data are), what kind of outputs models generate, and how they are working to ensure tools are not being misused or abused in unethical ways. This should also include data on accuracy rates for human and automated

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 33 intelligence-moderate-user-generated-content/ detection and removal, including the false positive, true positive, and false negative rates, as well as the precision and recall metrics.112 This will help policymakers understand the limitations of these tools and will also enable researchers and the public to better understand how these tools are impacting user expression and the content that they engage with online.

• Companies should use transparency reports as a mechanism for providing additional public information about their automated content moderation practices. At a minimum, companies should break down the number of accounts and pieces of content that were fagged and removed by how they were detected (e.g. through the use of automated tools, through user fags, etc.), and they should also report on how much of the content that was fagged and/or removed using automated tools was erroneously done so, as well as how much of this content was subsequently restored, either proactively by the platform or through user appeals. Currently, very few companies disclose data around how their automated tools impact user speech and no platforms do it in a manner that is comprehensive and meaningful. Resources such as the Open Technology Institute’s Transparency Reporting Toolkit,113 Ranking Digital Rights’ Corporate Accountability Index,114 and the Santa Clara Principles115 can help companies navigate this disclosure process and understand what meaningful transparency and accountability in this regard looks like.

• In order to foster more fairness and accountability, companies should also provide notice to users who have had their content removed in general, but especially as a result of automated tools, and they should also ofer users a robust appeals process that is timely and easy to navigate in order to rectify erroneous takedowns. Guiding standards related to notice and appeals can also be found in the Santa Clara Principles,116 Corporate Accountability Index,117 and the Electronic Frontier Foundation’s Who Has Your Back: Censorship report.118

• In order to foster a better understanding of the quality of automated tools, how they work, and what their limitations are, companies should further engage with the research community and provide them with access to their models for evaluation and assessment. Although companies have stated that they have concerns regarding protecting their trade secrets and wanting to avoid their systems from being gamed, there are avenues in which responsible and secure research can take place. For example, companies could establish safeguards such as a robust registration and security authentication processes for researchers.

• Internet platforms are investing more in hiring human content moderators who have specifc regional or linguistic expertise in order to

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 34 intelligence-moderate-user-generated-content/ help localize their moderation eforts and ensure that they can capture the nuances and contextual intricacies of human speech. The same level of efort needs to be invested in developing algorithmic models that are diverse and that can account for variations in speech and online behavior across regions, communities, and so on. The majority of developers creating these models are Western and English speaking, and a large proportion of the training set data is similarly skewed. As a result, these models refect data and creator biases and are not adequately providing meaningful and efective outputs to the millions of users on these platforms who are non-Western and non-English speakers.

3. Research on algorithmic decision-making in the content moderation space needs to be more robust and should seek to test and compare how efective automated content moderation tools are across a range of factors including platforms, domains, and demographic attributes. With cooperation from companies, researchers should also seek to provide further insight around how datasets and classifers are constructed and how accurate these tools are. The establishment of meaningful metrics in this space would also strongly guide future work and policy development. These insights will be valuable to both policymakers working to legislate and advocate around this space, as well as companies seeking to improve and refne their content moderation policies and practices. In addition, researchers should seek to broaden the scope of their research to include diverse types of speech, particularly non-Western and non-English speech, in order to pave the way for automated content moderation tools to become more accurate for users worldwide.

4. Automated tools should supplement, not supplant, human roles. Typically, conversations around the adoption of automated tools in any industry intersect with the notion that these tools will soon replace human labor as they are more efcient and cost-efective. However, in the context of content moderation, the efectiveness of these tools to identify and remove content across categories, formats, and platforms has proven to be limited. In order to safeguard freedom of expression and foster and maintain fairness and accountability in the content moderation process, internet platforms should ensure that human moderators remain in the loop during the moderation process, specifcally when moderating categories of content that are vaguely defned and that require additional context to understand. By adopting and further streamlining hybrid approaches to content moderation, platforms should seek to use automated tools to augment human intelligence and enable human moderators to perform more efectively at scale, rather than to replace humans in this process entirely.

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 35 intelligence-moderate-user-generated-content/ Notes 9 Salman Aslam, "Facebook by the Numbers: Stats, Demographics & Fun Facts," Omnicore Agency, last 1 Kyle Langvardt, "Regulating Online Content modified January 6, 2019, https:// Moderation," The Georgetown Law Journal 106, no. www.omnicoreagency.com/facebook-statistics/. 1353 (2018): https://georgetownlawjournal.org/ articles/268/regulating-online-content-moderation/ 10 Salman Aslam, "YouTube by the Numbers: Stats, pdf. Demographics & Fun Facts," Omnicore Agency, last modified January 6, 2019, https:// 2 Sarah T. Roberts, "Digital Detritus: 'Error' and the www.omnicoreagency.com/youtube-statistics/. Logic of Opacity in Social Media Content Moderation," First Monday 23, no. 3 (March 5, 2018): 11 Salman Aslam, "Twitter by the Numbers: Stats, https://journals.uic.edu/ojs/index.php/fm/article/ Demographics & Fun Facts," Omnicore Agency, last view/8283/6649. modified January 6, 2019, https:// www.omnicoreagency.com/twitter-statistics/. 3 James Grimmelmann, "The Virtues of Moderation," Yale Journal of Law and Technology 17, 12 Mark MacCarthy, "It's Time to Think Seriously no. 1 (2015): https://digitalcommons.law.yale.edu/cgi/ About Regulating Platform Content Moderation viewcontent.cgi?article=1110&context=yjolt. Practices," CIO, February 14, 2019, https:// www.cio.com/article/3340323/its-time-to-think- 4 Grimmelmann, "The Virtues of Moderation". seriously-about-regulating-platform-content- moderation-practices.html. 5 Grimmelmann, "The Virtues of Moderation". 13 Allow States and Victims to Fight Online Sex 6 Grimmelmann, "The Virtues of Moderation". Trafficking Act of 2017, H.R. 1865, 115th Cong. (2018)

7 Kate Klonick, "The New Governors: The People, 14 New America's Open Technology Institute, "OTI Rules, and Processes Governing Online Speech," Har Disappointed in the House-Passed FOSTA-SESTA vard Law Review 131, no. 1598 (April 10, 2018): https:// Bill," news release, February 27, 2018, https:// harvardlawreview.org/wp-content/uploads/ www.newamerica.org/oti/press-releases/oti- 2018/04/1598-1670_Online.pdf. This report disappointed-house-passed-fosta-sesta-bill/. incorporates Klonick’s framework for the different stages of content moderation. However, the 15 India McKinney and Elliot Harmon, "Platform framework has been adapted to emphasize the role Liability Doesn't – And Shouldn't - Depend on of algorithmic tools and manual content moderation Content Moderation Practices," Electronic Frontier processes during the ex-post proactive and ex-post Foundation, last modified April 9, 2019, https:// reactive content moderation stages. www.eff.org/deeplinks/2019/04/platform-liability- doesnt-and-shouldnt-depend-content-moderation- 8 Internet Referral Units are government-established practices. entities responsible for flagging content to internet platforms that violates the platform’s Terms of 16 New America's Open Technology Institute, "Bill Service. Trusted Flaggers are individuals, NGOs, Purporting to End Internet Censorship Would government agencies, and other entities that have Actually Threaten Free Expression Online," news demonstrated accuracy and reliability in flagging release, content that violates a platform’s Terms of Service. June 20, 2019, https://www.newamerica.org/oti/ As a result, they often receive special flagging tools press-releases/bill-purporting-end-internet- such as the ability to bulk flag content. censorship-would-threaten-free-expression-online/.

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 36 intelligence-moderate-user-generated-content/ 17 Filippo A. Raso et al., Artificial Intelligence & 27 Klonick, "The New Governors.” Human Rights: Opportunities & Risks, September 25, 2018, https://cyber.harvard.edu/publication/2018/ 28 Klonick, "The New Governors.” artificial-intelligence-human-rights. 29 Evan Engstrom and Nick Feamster, The Limits of 18 MacCarthy, "It's Time to Think Seriously About Filtering: A Look at the Functionality & Shortcomings Regulating Platform Content Moderation Practices". of Content Detection Tools, March 2017, https:// www.engine.is/the-limits-of-filtering. 19 Center for Democracy & Technology, "Overview of the NetzDG Network Enforcement Law," Center 30 Counter Extremism Project, "How CEP's eGLYPH for Democracy & Technology, last modified July 17, Technology Works," Counter Extremism Project, last 2017, https://cdt.org/insight/overview-of-the-netzdg- modified December 8, 2016, https:// network-enforcement-law/. www.counterextremism.com/video/how-ceps- eglyph-technology-works. 20 Zak Doffman, "EU Approves Billions In Fines For Google And Facebook If Terrorist Content Not 31 Some platforms such as Facebook, YouTube, and Removed," Forbes, April 18, 2019, https:// Twitter do provide limited disclosures on how much www.forbes.com/sites/zakdoffman/2019/04/18/ extremist content or accounts they remove in their huge-eu-fines-for-facebook-and-google-if-terrorist- transparency reports. Facebook also reports on how material-not-removed-in-first-hour/#16ebe9911271. much extremist content the platform erroneously removed and restored. However, it is unclear what 21 Department of Digital, Culture, Media & Sport, O proportion of these removals were due to the use of nline Harms White Paper, April 2019, https:// the shared hash database. assets.publishing.service.gov.uk/government/ uploads/system/uploads/attachment_data/file/ 32 Accenture, Content Moderation: The Future is 793360/Online_Harms_White_Paper.pdf. Bionic, 2017, https://www.accenture.com/ _acnmedia/PDF-65/Accenture-Webscale-Content- 22 Linda Kinstler, "Germany's Attempt to Fix Moderation.pdf. Facebook Is Backfiring," The Atlantic, May 18, 2018, https://www.theatlantic.com/international/archive/ 33 Accenture, Content Moderation: The Future is 2018/05/germany-facebook-afd/560435/. Bionic.

23 Klonick, "The New Governors: The People, Rules, 34 Engstrom and Feamster, The Limits of Filtering: A and Processes Governing Online Speech". Look at the Functionality & Shortcomings of Content Detection Tools 24 Klonick, "The New Governors: The People, Rules, and Processes Governing Online Speech". 35 Natasha Duarte, Emma Llansó, and Anna Loup, Mixed Messages? The Limits of Automated Social 25 Kalev Leetaru, "The Problem With AI-Powered Media Content Analysis, November 28, 2017, https:// Content Moderation Is Incentives Not Technology," F cdt.org/files/2017/11/Mixed-Messages-Paper.pdf. orbes, March 19, 2019, https://www.forbes.com/ sites/kalevleetaru/2019/03/19/the-problem-with-ai- 36 Duarte, Llansó, and Loup, Mixed Messages? powered-content-moderation-is-incentives-not- technology/#419282e755b7. 37 Duarte, Llansó, and Loup, Mixed Messages?

26 Klonick, "The New Governors.” 38 Raso, Filippo and Hilligoss, Hannah and Krishnamurthy, Vivek and Bavitz, Christopher and

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 37 intelligence-moderate-user-generated-content/ Kim, Levin Yerin, Artificial Intelligence & Human 50 Duarte, Llansó, and Loup, Mixed Messages? Rights: Opportunities & Risks (September 25, 2018). Berkman Klein Center Research Publication No. 51 Duarte, Llansó, and Loup, Mixed Messages? 2018-6. Available at SSRN: https://ssrn.com/ abstract=3259344 or http://dx.doi.org/10.2139/ssrn. 52 Duarte, Llansó, and Loup, Mixed Messages? 3259344 53 Duarte, Llansó, and Loup, Mixed Messages? 39 Duarte, Llansó, and Loup, Mixed Messages? 54 Duarte, Llansó, and Loup, Mixed Messages? 40 Duarte, Llansó, and Loup, Mixed Messages? 55 YouTube, YouTube Community Guidelines 41 Raso, Filippo and Hilligoss, Hannah and Enforcement Report, 2019, https:// Krishnamurthy, Vivek and Bavitz, Christopher and transparencyreport.google.com/youtube-policy/ Kim, Levin Yerin, Artificial Intelligence & Human flags. Rights: Opportunities & Risk 56 Langvardt, "Regulating Online Content 42 Raso, Filippo and Hilligoss, Hannah and Moderation”. Krishnamurthy, Vivek and Bavitz, Christopher and 57 Maayan Perel and Niva Elkin-Koren, "Black Box Kim, Levin Yerin, Artificial Intelligence & Human Tinkering: Beyond Disclosure in Algorithmic Rights: Opportunities & Risk Enforcement," Florida Law Review69, no. 181 (2017): 43 Duarte, Llansó, and Loup, Mixed Messages? http://www.floridalawreview.com/wp-content/ uploads/Perel_Elkin-Koren.pdf. 44 Duarte, Llansó, and Loup, Mixed Messages? 58 Perel and Elkin-Koren, "Black Box Tinkering: 45 Grimmelmann, "The Virtues of Moderation". Beyond Disclosure in Algorithmic Enforcement".

46 Robyn Caplan, Content or Context Moderation: 59 Gennie Gebhart, Who Has Your Back? Artisanal, Community-Reliant, and Industrial Censorship Edition 2019, June 12, 2019, https:// Approaches, November 14, 2018, https:// www.eff.org/wp/who-has-your-back-2019. datasociety.net/wp-content/uploads/2018/11/ DS_Content_or_Context_Moderation.pdf. 60 Alexa’s global internet engagement metric is based on the global internet traffic and engagement 47 James Vincent, "AI Won't Relieve the Misery of a platform receives over the past 90 days. Facebook's Human Moderators," The Verge, February 27, 2019, https://www.theverge.com/ 61 Alexa, "facebook.com Competitive Analysis, 2019/2/27/18242724/facebook-moderation-ai- Marketing Mix and Traffic," Alexa, https:// artificial-intelligence-platforms. www.alexa.com/siteinfo/facebook.com.

48 Hillary K. Grigonis, "Social (Net)Work: What can 62 Dan Noyes, "The Top 20 Valuable Facebook A.I. Catch — and Where Does It Fail Miserably?," Dig Statistics – Updated July 2019," Zephoria Digital ital Trends, February 3, 2018, https:// Marketing, last modified July 2019, https:// www.digitaltrends.com/social-media/social-media- zephoria.com/top-15-valuable-facebook-statistics/. moderation-and-ai/. 63 Casey Newton, "Bodies in Seats," The Verge, 49 Duarte, Llansó, and Loup, Mixed Messages? June 19, 2019, https://www.theverge.com/

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 38 intelligence-moderate-user-generated-content/ 2019/6/19/18681845/facebook-moderator-interviews- machine-learning-artificial-intelligence-christchurch- video-trauma-ptsd-cognizant-tampa. attack-video-facebook-amazon-rekognition.

64 Alexis C. Madrigal, "Inside Facebook's Fast- 77 Avi Asher-Schapiro, "YouTube and Facebook Are Growing Content-Moderation Effort," The Atlantic, Removing Evidence of Atrocities, Jeopardizing Cases February 7, 2018, https://www.theatlantic.com/ Against War Criminals," The Intercept, last modified technology/archive/2018/02/what-facebook-told- November 2, 2017, https://theintercept.com/ insiders-about-how-it-moderates-posts/552632/. 2017/11/02/war-crimes-youtube-facebook-syria- rohingya/. 65 Madrigal, "Inside Facebook's Fast-Growing Content-Moderation Effort". 78 Spandana Singh, Assessing YouTube, Facebook and Twitter's Content Takedown Policies: How 66 Under the Hood Session Internet Platforms Have Adopted the 2018 Santa Clara Principles, May 7, 2019, https:// 67 Under the Hood Session www.newamerica.org/oti/reports/assessing-youtube- facebook-and--content-takedown-policies/ 68 Klonick, "The New Governors: The People, Rules, and Processes Governing Online Speech". 79 The Santa Clara Principles On Transparency and Accountability in Content Moderation," Santa Clara 69 Ben Bradford et al., Report Of The Facebook Principles, last modified May 7, 2018, https:// Data Transparency Advisory Group, April 2019, santaclaraprinciples.org/. https://law.yale.edu/system/files/area/center/justice/ document/dtag_report_5.22.2019.pdf. 80 Lauren Feiner, "Reddit Users Are The Least Valuable Of Any Social Network," CNBC, February 11, 70 Bradford et al., Report Of The Facebook Data 2019, https://www.cnbc.com/2019/02/11/reddit- Transparency Advisory Group. users-are-the-least-valuable-of-any-social- network.html. 71 Bradford et al., Report Of The Facebook Data Transparency Advisory Group. 81 Alexa, "reddit.com Competitive Analysis, Marketing Mix and Traffic," Alexa, https:// 72 Under the Hood Session www.alexa.com/siteinfo/reddit.com. 73 Bradford et al., Report Of The Facebook Data 82 Colm Gorey, "How Reddit's Dublin Office Plans Transparency Advisory Group. to Tackle Evil On The 'Front Page Of The Internet,'" Si 74 Facebook, Community Standards Enforcement licon Republic, May 13, 2019, https:// Report, 2019, https://transparency.facebook.com/ www.siliconrepublic.com/companies/reddit-cto- community-standards-enforcement. chris-slowe-content-moderation-dublin.

75 Bradford et al., Report Of The Facebook Data 83 Grimmelmann, "The Virtues of Moderation". Transparency Advisory Group. 84 Caplan, Content or Context Moderation. 76 Joseph Cox, "Machine Learning Identifies 85 Caplan, Content or Context Moderation. Weapons in the Christchurch Attack Video. We Know, We Tried It," Motherboard, April 17, 2019, 86 Christine Kim, "Ethereum's Reddit Moderators https://www.vice.com/en_us/article/xwnzz4/ Resign Amid Controversy," Coindesk, May 12, 2019,

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 39 intelligence-moderate-user-generated-content/ https://www.coindesk.com/ethereums-reddit- 98 J. Clement, "Cumulative Total of Tumblr Blogs moderators-rethink-approach-after-community- from May 2011 to April 2019 (in Millions)," Statista, flashpoint . last modified 2019, https://www.statista.com/ statistics/256235/total-cumulative-number-of- 87 Grimmelmann, "The Virtues of Moderation". tumblr-blogs/.

88 Benjamin Plackett, "Unpaid and Abused: 99 Tumblr, "Support on Tumblr," Tumblr, last Moderators Speak Out Against Reddit," Engadget, modified December 3, 2018, https:// August 31, 2018, https://www.engadget.com/ support.tumblr.com/post/180758979032/updates-to- 2018/08/31/reddit-moderators-speak-out/? tumblrs-community-guidelines. guccounter=1. 100 Ben Dickson, "The Challenges of Moderating 89 Caplan, Content or Context Moderation. Online Content With Deep Learning," TechTalks, last modified December 10, 2018, https:// 90 Caplan, Content or Context Moderation. bdtechtalks.com/2018/12/10/ai-deep-learning-adult- content-moderation/. 91 Gorey, "How Reddit's Dublin Office Plans to Tackle Evil On The 'Front Page Of The Internet". 101 Louise Matsakis, "Tumblr's Porn-Detecting AI Has One Job—And It's Bad At It," WIRED, December 92 Joseph Seering et al., "Moderator Engagement 5, 2018, https://www.wired.com/story/tumblr-porn- and Community Development in the Age of ai-adult-content/. Algorithms," New Media & Society21, no. 7 (January 11, 2019): https://www.andrew.cmu.edu/user/ 102 Matsakis, "Tumblr's Porn-Detecting AI Has One jseering/papers/ Job—And It's Bad At It". Seering%20et%20al%202019%20Moderators.pdf. 103 Tarleton Gillespie, Twitter post, December 2018, 93 This reporting excludes spam-related removals 12:07 p.m., https://twitter.com/TarletonG

94 Reddit, Transparency Report 2018, https:// 104 Dickson, "The Challenges of Moderating Online www.redditinc.com/policies/transparency- Content With Deep Learning". report-2018. 105 Matsakis, "Tumblr's Porn-Detecting AI Has One 95 Eshwar Chandrasekharan et al., "The Internet's Job—And It's Bad At It". Hidden Rules: An Empirical Study of Reddit Norm Violations at Micro, Meso, and Macro Scales," Procee 106 Matsakis, "Tumblr's Porn-Detecting AI Has One dings of the ACM on Human Computer Interaction, Job—And It's Bad At It". 2nd ser., November 2018, http://eegilbert.org/papers/ 107 Tarleton Gillespie, Twitter post, December 2018, cscw18-chand-norms.pdf. 12:07 p.m., https://twitter.com/TarletonG

96 Alexa, "tumblr.com Competitive Analysis, 108 Tarleton Gillespie, Twitter post, December 2018, Marketing Mix and Traffic," Alexa, https:// 12:07 p.m., https://twitter.com/TarletonG www.alexa.com/siteinfo/tumblr.com. 109 Matsakis, "Tumblr's Porn-Detecting AI Has One 97 J. Clement, "Tumblr - Statistics & Facts," Statista, Job—And It's Bad At It". last modified August 6, 2018, https:// www.statista.com/topics/2463/tumblr/.

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 40 intelligence-moderate-user-generated-content/ 110 Matsakis, "Tumblr's Porn-Detecting AI Has One Job—And It's Bad At It".

111 Tumblr, Copyright and Trademark Transparency Report July- December 2018, https:// static.tumblr.com/lmvezem/ydGpttntk/ transparency_report-july-december_2018.pdf.

112 Bradford et al., Report Of The Facebook Data Transparency Advisory Group.

113 Spandana Singh and Kevin Bankston, The Transparency Reporting Toolkit: Content Takedown Reporting, October 25, 2018, https:// www.newamerica.org/oti/reports/transparency- reporting-toolkit-content-takedown-reporting/.

114 Ranking Digital Rights, "2019 Ranking Digital Rights Corporate Accountability Index," Ranking Digital Rights, last modified May 15, 2019, https:// rankingdigitalrights.org/index2019/.

115 "The Santa Clara Principles On Transparency and Accountability in Content Moderation".

116 "The Santa Clara Principles”.

117 “2019 Ranking Digital Rights Corporate Accountability Index”.

118 Gebhart, Who Has Your Back? Censorship Edition 2019.

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 41 intelligence-moderate-user-generated-content/

This report carries a Creative Commons Attribution 4.0 International license, which permits re-use of New America content when proper attribution is provided. This means you are free to share and adapt New America’s work, or include our content in derivative works, under the following conditions:

• Attribution. You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

For the full legal code of this Creative Commons license, please visit creativecommons.org.

If you have any questions about citing or reusing New America content, please visit www.newamerica.org.

All photos in this report are supplied by, and licensed to, shutterstock.com unless otherwise stated. Photos from federal government sources are used under section 105 of the Copyright Act.

newamerica.org/oti/reports/everything-moderation-analysis-how-internet-platforms-are-using-artificial- 42 intelligence-moderate-user-generated-content/