Recommender Systems Their Use in Social Media and Recommendations on Their Regulation

RECOMMENDER SYSTEMS THEIR USE IN SOCIAL MEDIA AND RECOMMENDATIONS ON THEIR REGULATION

Erik T

The Growth of Social Media

Social media has exploded over the course of the last decade and a half from a means of talking to friends and sharing pictures to an all-encompassing platform responsible for delivering news, the world’s largest means of advertising to consumers, and organizing social groups of all kinds. Among this explosion of users “a minority of [them are] seeking to use [social media] as a platform to undermine democracy and incite offline violence”.1

These users take advantage of the way social media companies recommend content to their users to spread misinformation and outright falsehoods. They have been troublingly successful on a variety of different issues including ones that are matters of life and death. This paper outlines the system that social media companies use to recommend content and the effect it can have on the users, some of the situations in which groups have taken advantage of the massive reach of social media to push particular agendas, and, finally, suggest a way in which some of the problems of the system as it currently exists can be addressed.

I. Factual Context

The process by which social media companies can best recommend content to their users and the effects that those systems can have on users have been the subject of study since social media emerged onto the scene. The two parts of this section are, first, an examination of how recommendations are made and evaluated and then look at the effect that recommender systems and personalization have on social media users and the psychological flaws that they sometimes exploit.

1 BSR, Human Rights Impact Assessment: Facebook in Myanmar, 24 (2018), available at https://fbnewsroomus.files.wordpress.com/2018/11/bsr-facebook-myanmar-hria_final.pdf 1

A. Background on Technology and Application

Recommender systems, the process by which information is recommended to users, have been the subject of study for over two decades. The main type of information recommendation in use by social media is called Information Filtering and encompasses all recommender systems that put information through a filter before showing it to the user. They typically make use of user profiles to track the preferences of users over time, are used to deal with dynamic data sets, and are used for social systems.2 This section looks at how recommender systems mine data from large data sets and what kind of recommender systems are being used by social media platforms.

i. Data Mining for Recommender Systems

First recommender systems must transform the data it is given into a form that is machine readable. Typically, recommender systems do this by putting data through three steps - preprocessing, analysis, and interpretation.3

The overarching goal of the first step, preprocessing, is to find out how alike certain objects are based on their attributes. Preprocessing transforms the raw data that is provided through: 1) similarity or distance measures, 2) sampling, 3) dimensionality reduction, and 4) denoising.

Similarity and distance measures are used in all recommender systems, while sampling and dimensionality reduction are used when datasets are too large or there are too many variables and sparse information respectively.

2 Uri Hanani, Bracha Shapira, Peretz Shoval, Information Filtering: Overview of Issues, Research, and Systems, User Modeling and User-Adapted Interaction, 204 (2001), available at https://www.researchgate.net/profile/Peretz_Shoval/publication/220116306_Information_Filtering_Overview_of_Is sues_Research_and_Systems/links/0fcfd50745a87e9bc0000000/Information-Filtering-Overview-of-Issues- Research-and-Systems.pdf 3 Springer Science + Business Media, LLC, Recommender Systems Handbook 78 (Francesco Ricci, et al., 2011), available at http://www.cs.ubbcluj.ro/~gabis/DocDiplome/SistemeDeRecomandare/Recommender_systems_handbook.pdf 2

1) Similarity and distance measures are a variety of classification methods – subcategorized into distance calculations, similarities, and correlations – whose goal is to determine how similar two objects are based on their attributes. The most commonly used measures are the cosine similarity, which considers “items as documented vectors of an n-dimensional space and compute[s] their similarity as the cosine of the angle that they form”,4 and the Pearson correlation, which “measures the linear relationship between objects”,5 because they scale well to a large number of attributes. For instance, some social media asks users to apply tags to their content when they upload it. In those systems a distance measure would be used to determine how many tags content have in common and assign them a distance score. The greater the distance calculated through whichever classification method is used, the less similar the objects.

2) Sampling is a technique used in data mining to select “a subset of relevant data from a larger data set”.6 It may because the data set is too large feasibly to process all at once, to create a training dataset to “learn the parameters or configure the algorithms used in the analysis step”, or to create a testing dataset to make sure that the training dataset performs well with previously unseen data also known as cross-validation.7 It is common in recommendation systems to

“sample the available feedback from the users” and cross-validate the results of the algorithm.8

Sampling is primarily used to check the available dataset for consistency and sometimes shrink it down when it becomes too unwieldy. This is especially important for today’s social media companies, which handle millions or billions of users and pieces of content.

4 Id. at 41 5 Id. at 42 6 Id. at 42 7 Id. at 42 8 Id. at 43 3

3) Dimensionality reduction is the practice of finding patterns in datasets that have many attributes, referred to as dimensions, and sparse information about those attributes9. For instance,

Principal Component Analysis “reduce[s] the dimensionality of the data by neglecting the components with a small contribution to the variance”.10 These components can be ignored because doing so does not massively affect the total variance of the data. When, for instance, a movie recommender system deals with fans of genres, say horror fans, as a block instead of as individuals, they are able to cut down the number of calculations they have to do. If you only have to compare how similar, say, 20 blocks of movie genres are instead of comparing the similarities of millions of users it will take much less computational time.

4) Denoising is aimed at removing unwanted noise, which is “any unwanted artifact introduced in the data collection phase that might affect the result of our data analysis and interpretation”.11 In recommender systems there is both natural noise, such as users misspelling the tags they assign to content, and malicious noise, users trying to insert themselves into a place they don’t belong my assigning their content the wrong tags for instance. Natural noise “is unvoluntarily introduced by users when giving feedback on their preference”.12 Malicious noise, on the other hand, “is deliberately introduced in a system in order to bias the results”.13 The effects of both are significant and they can be mitigated by asking users to re-rate some items.

The second step, analysis, has two main goals: 1) classification and 2) description. The two parts of analysis deal, confusingly, with the classification of supervised classifiers, where “a set

9 Id. at 44 10 Id. at 44 11 Id. at 47 12 Id. at 47 13 Id. at 47 4 of labels or categories is known in advance”,14 and unsupervised classifiers, where “the labels or categories are unknown in advance and the task is to suitably organize the elements at hand”,15 respectively.

Classification is “a mapping between a feature space and a label space, where the features represent characteristics of the elements to classify and the labels represent the classes”.16 The fact that the elements and labels have pre-assigned characteristics is what makes them supervised. The best algorithms to learn supervised classifiers are: the nearest neighbor classifier, which for a number (k) of nearest neighbors from the available training records and “assigns the class label according to the class labels of those nearest neighbors”17 and is the simplest of all machine learning algorithms; decision trees, which take the shape of a tree with decision nodes that test a single-attribute value to determine which branch of the subtree it applies and leaf nodes “which indicate the value of the target attribute”;18 Bayesian classifiers, which are “a probabilistic framework for solving classification problems … based on the definition of conditional probability and the Bayes theorem”,19 they are particularly useful in situations where a recommender system is just starting up; and artificial neural networks, which are “an assembly of inter-connected nodes and weighted links that is inspired in the architecture of the biological brain”, the main advantage of which is that “they can perform non-linear classifications tasks” and can operate even if part of the network fails.20

14 Id. at 48 15 Id. at 48 16 Id. at 48 17 Id. at 48 18 Id. at 50 19 Id. at 52 20 Id. at 55 5

Description is the process of organizing unsupervised data through association rule mining, which “focuses on finding rules that will predict the occurrence of an item based on the occurrence of other items in a transaction”,21 and cluster analysis, which “consists of assigning items to groups so that the items in the same groups are more similar than the items in different groups”.22 Cluster analysis is completed using the k-Means algorithm that “works by randomly selecting k centroids” and assigning all items “to the cluster whose centroid is the closest to them”.23

Classifiers are evaluated by the measure of four instances: true positives, the “number of instances classified as belonging to class A that truly belong to class A”; true negatives, the

“number of instances classified as not belonging to class A that in fact do not belong to class A”; false positives, the “number of instances classified as class A that do not belong to class A”; and false negatives, the “number of instances classified as not belonging to class A that in fact do belong to class A”.24 Accuracy is then equal to (TP + TN)/(TP + TN +FP + FN).25 There are also precision, equal to TP/(TP + FP), and recall, equal to TP/(TP + FN).26 Together these techniques and machine learning algorithms are used to assign a set of labels or categories to a set of data.

ii. What Recommender Systems are Used Today

The modern landscape of recommender systems has transitioned towards combining all available types of recommender or filtering systems into a single hybrid recommender system.

Hybrid recommender systems can mitigate some of the problems that arise when using only one

21 Id. at 64 22 Id. at 61 23 Id. at 62 24 Id. at 59 25 Id. at 59 26 Id. at 59-60 6 recommender system or one type of recommender system. When multiple recommender systems are combined into a hybrid recommender system they tend to make up for each other’s shortcomings. The most important of the types of filtering that are typically incorporated into hybrid recommender systems are content-based, collaborative, and knowledge-based.

Content-based filtering starts by creating a profile for all items in the system based on discrete attributes and features. For example, a content-based filtering system for movies may track the actors that appear in movies, user-defined genres, run time, director, and a variety of other attributes. Then the system creates a content-based profile for the user “based on a weighted vector of item features” of items that the user has previously rated or viewed.27 The weight can be calculated through a variety of techniques including Bayesian classifiers, cluster analysis, and artificial neural networks. These techniques are all described above. Ultimately the weighted vector represents the probability that the user is going to like a particular item in the system. Content-based filtering systems run into three main problems, however: there is limited content analysis available for use, especially outside of fairly constrained subjects such as books or movies;28 they “have no inherent method for finding something unexpected”, meaning that users are recommended only items similar to those which they have rated; and they do not provide very good recommendations to users when they have submitted few ratings.29

In collaborative filtering, “the key idea is that the rating of [the user] for a new item i is likely to be similar to that of another user v if [the user] and v have rated other items in a similar way. It can be seen in use, on Amazon for instance, whenever a user is prompted to engage with

27 Stephanie Blanda, Online Recommender Systems – How Does a Website Know What I Want?, American Mathematical Society (May 25, 2015), https://blogs.ams.org/mathgradblog/2015/05/25/online-recommender- systems-website-want/ 28 29 Handbook at 109 7 something through a message that ‘other users also liked’ the thing being recommended.

Collaborative filtering makes up for some of the shortfalls of content-based filtering by being able to recommend items whose content is difficult to obtain “through the feedback of other users” and is a better indicator of quality because it is “based on the quality of items as evaluated by peers, instead of relying on content that may be a bad indicator quality”.30

Both content-based and collaborative filtering, referred to collectively as neighbor-based recommender systems, share two important flaws. First, they offer limited coverage “because rating correlation measures the similarity between two users by comparing their ratings for the same items” meaning that “users can be neighbors only if they have rated common items”.31

Second, their accuracy suffers when there is a lack of available rating “due to the fact that users typically rate only a small proportion of the available items”.32

Knowledge-based systems “recommend items based on specific domain knowledge about how certain item features meet users needs and preferences and, ultimately, how the item is useful for the user”.33 These systems mostly determine recommendations based on predefined knowledge bases that contain explicit rules about the relationships between customer requirements and item features. The movie recommender system that came up earlier, for instance, may have been told that some movies were created by a particular director and thus recommend more movies by that director. They work better than other systems when they are first deployed but tend to break down quickly if not equipped with learning components.

30 Id. at 111 31 Id. at 131 32 Id. at 11 33 Id. at 12 8

The creation of a hybrid recommender system covers many of the problems inherent in the pre-existing recommendation systems when they are used separately. Furthermore, there are three “algorithmic paradigms for incorporating contextual information into the recommendation process”.34 The three algorithms are: reduction-based (pre-filtering), where “only the information that matches the current usage context … [is] used to compute the recommendations”; contextual post filtering, where “the recommendation algorithm ignores the context information”; and contextual modeling, in which “context data is explicitly used in the prediction model”.35

Contextual information includes things such as time, physical location, and relationships between users such as familial ties or membership in groups. Reduction-based would be when a recommender system only uses a user’s video watches to recommend videos. Contextual post filtering would ignore pieces of context such as the time of day or where the person is when recommending certain types of content which are not context dependent. Contextual modeling is the exact opposite of that and uses contextual information to recommend content that is context dependent, such as what restaurants are nearest to a user or which are open.

1) What Effect Can This Have on Users?

i. The Effects of Recommender Systems and Personalization in Social Media

A filter bubble is the space in which a person engages with the Internet, typically accompanied by negative insinuations about the extent to which content is catered to users. It was first used by Eli Pariser to describe “the dangers of implicit and explicit personalization in

34 Id. at 14 35 Id. at 14 9 online services and traditional media”.36 According to the theory, when the content recommended to a user is over-personalized it reinforces the beliefs the user has and then recommends ever more similar content to their existing views, creating a feedback loop. It is alleged to have arisen as a result of the push for personalization by companies that find themselves overwhelmed by the sheer volume of information available to them.37 The two questions being considered in this section are whether there is a filter bubble on social media platforms and, if there is, whether it has influenced the behavior of users.

There is ample evidence to suggest that there is a filter bubble on social media platforms.

Increasing participation in social media is creating “enormous trails of data by … communicating, buying, sharing or searching”.38 Internet companies have attempted to mitigate the effects of the resulting information overload by integrating all the information that they can access about users to personalize searches and content recommendation. At the same time the sheer amount of content being created every day is dwarfing the human ability to absorb or process information. The attempts to personalize are also meant to provide value-added information to users and thus mitigate the effects of the exponential increase in the amount of data being created.39 Facebook, for instance, includes what it calls social gestures, which include interactions with other users, likes, shares, and subscriptions.40 This information affects what information is served to the user in Facebook’s news feed.

36 Engin Bozdag, Bias in Algorithmic Filtering and Personalization, 15 Ethics and Information Technology 209-227 (2013), available at https://link.springer.com/article/10.1007/s10676-013-9321-6 37 “Personalization systems address the overstimulation problem by building, managing, and representing customized for information customized for individual users.” Id. 38 Id. 39 Id. 40 Id. 10

The concern then is whether that decision about whether to serve certain information to users affects the user’s behavior and thus has a perpetual effect on the content users see. A study of the degenerative effects of a variety of factors in recommender systems found that all systems degenerated over time, but that the rate of degeneracy differed for each of the examined factors.

The study suggests that “the best remedies against system degeneracy … are continuous random exploration and growing the candidate pool at least linearly”.41 While the exponential growth of content may help to suppress the degeneracy of the system as a whole, the attempts of social media companies to increase the accuracy of their recommendations through increasing the amount of data that they use to recommend content most closely follow what the study designed as the worst-case scenario, the Optimal Oracle model.42 The more accurate the recommendations to users are, the faster the system degenerates into one in which a filter bubble exists.

ii. Taking Advantage of Psychological Flaws

Psychological flaws in the human mind are numerous and social media companies have manipulated them, wittingly or not. For instance, Facebook conducted an experiment in 2012 that was designed to determine whether emotional states could be spread through social networks. The study determined the answer to that question by running two parallel experiments

“one in which exposure to friends’ positive emotional content was reduced, and one in which exposure to negative emotional content was reduced”.43 The study notes that the experiment was adapted to run in the background and was thus consistent with Facebook’s Data Use Policy to

41 Ray Jiang et al., Degenerate Feedback Loops in Recommender Systems, Association for the Advancement of Artificial Intelligence (2019), available at https://arxiv.org/pdf/1902.10730.pdf 42 Id. at 5 43 Adan D.I. Kramer et al., Experimental Evidence of Massive-Scale Emotional Contagion Through Social Networks, Proceedings of the National Academy of Sciences 111 (2014), available at https://www.pnas.org/content/111/24/8788 11 which all users agree. The experiment did show emotional contagion, with people who had reduced positive content in their News Feed posting more negatively and the opposite for people who had a reduced rate of negative content.

The effects of social media on mood and beliefs is further exacerbated when fear and anxiety are brought into the mix. In a study examining the effects of perceived threat and level of topic involvement on attitudes towards a variety of topics, users were shown threat inducing images and questions and then presented with a balanced number of opinions and arguments for and against a variety of topics. Users that were exposed to threat inducing images and questions and that exhibited a low involvement in that topic “became more selective by reading significantly more attitude consistent” information.44 Users that expressed a high involvement in topics did not show any effect “as they behaved consistently by seeking balanced information with or without threat”. When facing cognitive dissonance people maintain their cognitive system “by either avoiding attitude inconsistent information, or by counter arguing attitude-inconsistent information”.45 These results do not bode well for a system that is pushing degeneracy of opinions through more accurate recommendation and playing around with users’ emotions on a whim.

II. What Harm Can It Bring?

The question then is whether there have been negative consequences of the usage of social media and, if there have been, whether those negative effects are consequential enough to warrant action by regulators or social media companies. It is difficult, if not impossible, to show

44 Q. Vera Liao & Wai-Tat Fu, Beyond the Filter Bubble: Interactive Effects of Perceived Threat and Topic Involvement on Selective Exposure to Information, Conference on Human Factors in Computing Systems 2359 (2013), available at http://cascade.cs.illinois.edu/publication/p2359-liao.pdf 45 Id. at 2366 12 that social media is the cause of any particular event. It is difficult to isolate the effect of social media because subjects are also interacting with the world around them while they use social media. A way to mitigate the effect of the subject’s interaction with the world around them is to use an extremely large set of subjects and change something about the social media that they interact with. Concerns still abound about studies that take that approach because they are experimenting with actual human beings.46 As a result, it is hard to find situations in which it is possible to definitively state that social media was the cause of behavior, though it is fairly easy to see correlation. If, for instance, there has been a fall in the level of trust people have in traditional media sources that coincides with increased usage of social media to consume news, there may not necessarily be a connection, but one may be there. It is not that correlation shows definitive causation, but that correlation can point to an amplification or facilitation of existing trends. Talking about the effect of social media is more about saying that there is a probability of an effect based on how closely related the correlations are.

Below are three instances in which social media probably contributed to, facilitated, or amplified the effects of some users. These events are: the Rohingya genocide in Myanmar, which

Facebook is alleged to have facilitated through its inaction; the creation of a recommendation network for pedophiles through manipulation of the algorithms of YouTube; and the spread of the anti-vaccination movement, which took advantage of the global reach and lax moderation of various social media companies.

46 Editorial Expression of Concern: Experimental Evidence of Massive Scale Emotional Contagion Through Social Networks, Proceedings of the National Academy of Sciences (July 22, 2014), available at https://www.pnas.org/content/111/29/10779.1 13

A. Myanmar: Rohingya Genocide

The Rohingya are an ethnic group, previously largely located in the Rakhine region of

Myanmar, that have long been denied many of the rights given to citizens of Myanmar.47 After the passage of a new citizenship law in 1982 the Rohingya were “essentially rendered stateless” and subjected to restrictions on “their rights to study, work, travel, marry, practice their religion and access health services”.48 Discrimination against the Rohingya has existed for decades, but the situation changed in August 2017 when the Burmese military began what it referred to as

“clearance operations”.49 These operations were not the start of violence directed at the

Rohingya, with violence dating back centuries and picking up in 2012, but prior violence had largely been the work of paramilitary groups.50 The clearance operations were framed as a response to an attack by a group called the Arakan Rohingya Salvation Army, but were, according to the United Nations, “immediate, brutal and grossly disproportionate”.51 The UN determined, through an inquiry by the Human Rights Council, that there was evidence that these operations were preplanned and “designed to instill immediate terror, with people woken by intense rapid weapon fire, explosions or the shouts and screams of villagers”.52 Ultimately, nearly 700,000 refugees fled the violence to the neighboring state of Bangladesh and independent estimates of the death toll range from 6,700 to 43,000.53

47 Who Are the Rohingya?, Al Jazeera (April 18, 2018), https://www.aljazeera.com/indepth/features/2017/08/rohingya-muslims-170831065142812.html 48 Id. 49 Report of the Independent International Fact-Finding Mission on Myanmar, Human Rights Council Thirty-Ninth Session (September 10-28, 2018), U.N. Doc. A/HRC/39/64 at 7 50 Id. at 7 51 Id. at 8 52 Id. at 8 53 Summary Report of Findings from Fact-Finding Mission to Bangladesh, ASEAN Parliamentarians for Human Rights (January 21-24, 2018), available at https://aseanmp.org/wp-content/uploads/2018/03/APHR_Bangladesh- Fact-Finding-Mission-Report_Mar-2018.pdf 14

But how is Facebook involved in this situation? The UNHRC report outlines the Burmese military’s use of Facebook to condone the use of “dehumanizing and stigmatizing language against the Rohingya”.54 The Burmese military’s attempts to inflame tensions have included

“insistence that ‘Rohingya’ do not exist or belong in Myanmar … , denial of the suffering of

Rohingya, the association of Rohingya identity with terrorism, and … repeated allusions to illegal immigration and incontrollable birth rates”.55 An investigative report by the New York

Times found examples of posts made by members of the Burmese military “[stretching] back half a decade” and “[targeting] the country’s mostly Muslim Rohingya minority group”.56

Burmese military personnel set up Facebook pages devoted to everything from beauty queens to a blog called Opposite Eyes, most with “no outward ties to the military”.57 Burmese military intelligence began a campaign in 2017 to spread rumors amongst both Buddhist and Muslim groups that an attack was imminent. The campaign’s purpose “was to generate widespread feelings of vulnerability and fear that could be salved only by the military’s protection”.58

Although Facebook did take down some of the messages, saying “it had found evidence that the messages were being intentionally spread by inauthentic accounts”, they did not investigate any links to the military or increase their moderation.59

The UNHRC called out Facebook in its report for being “a useful instrument for those seeking to spread hate” and said that Facebook’s response was “slow and ineffective”.60 This

54 Human Rights Council at 14 55 Human Rights Council at 14 56 The article includes mention of posts calling Islam a global threat to Buddhism and a false story about the rape of a Buddhist woman by a Muslim man. Paul Mozur, A Genocide Incited on Facebook With Posts from Myanmar’s Military, The New York Times (October 15, 2018), https://www.nytimes.com/2018/10/15/technology/myanmar- facebook-genocide.html 57 Id. 58 Id. 59 Id. 60 Human Rights Council at 14 15 was exacerbated by the fact that, for many people in Myanmar, Facebook is the internet and the fact that Facebook had no means “to provide country-specific data about the spread of hate speech on its platform”.61 This is largely due to the lackadaisical attitude that Facebook had taken towards monitoring content in Myanmar. The social media giant relies heavily on user reports and civil society groups, such as a Burmese group called Phandeeyar, to identify hate speech because, according to a Reuters investigation, their systems did not work for Burmese.62

Also, the company that Facebook outsources most of its Asian moderation to, Accenture, only had two Burmese speaking individuals reviewing problematic posts as late as 2015, neither of whom lived in Myanmar.63 By comparison, the Burmese military’s misinformation campaign is alleged to have started as far back as 2013.

Since the UNHRC report was made public Facebook has taken some steps to remedy the situation in Myanmar. Their first step was to commission a report from a third-party group,

Business for Social Responsibility, to examine what had gone wrong in Myanmar and what steps the company could take to prevent such a situation from happening again.64 This report arrived at many of the same conclusions as the UNHRC and suggested a variety of changes to how

Facebook does business in Myanmar, including a suggestion to “support more sustained initiatives that promote independent media and fact-checking”.65 The incident was also brought up during a Senate hearing with Mr. Zuckerberg in which he said that Facebook was “hiring dozens of more Burmese-language content reviewers”, “working with civil society in Myanmar

61 Id. 62 Steve Stecklow, Why Facebook is Losing the War on Hate Speech in Myanmar, Reuters (August 15, 2018), https://www.reuters.com/investigates/special-report/myanmar-facebook-hate/ 63Accenture is based in Kuala Lampur, a city in neighboring Malaysia. Id. 64 Alex Warofka, An Independent Assessment of the Human Rights Impact of Facebook in Myanmar, Facebook Newsroom (November 5, 2018), https://newsroom.fb.com/news/2018/11/myanmar-hria/ 65 BSR at 31 16 to identify specific hate figures so we can take down their accounts”, and “standing up a product team to do specific product changes in Myanmar and other countries that may have similar issues in the future”.66

Facebook is taking some meaningful steps to address its failures in Myanmar but the most disappointing part of the entire episode is that the situation had to escalate to such a degree before any action was taken. It may not be feasible for Facebook to catch all hate speech before it has real-world consequences, but there should be some action before it results in the displacement of nearly 700,000 people and the deaths of up to 43,000 more.

B. Pedophilia

It is an unfortunate fact that pedophiles exist on the internet and use it to prey on vulnerable children. Their existence has been prominently highlighted by the media as far back as 2004, when Chris Hansen teamed up with police to catch predators through chatrooms.67 This problem was highlighted again when Matt Watson, a YouTube user, published a video that he said showed the existence of a ring of pedophiles taking advantage of YouTube’s algorithms to link each other to otherwise innocent videos that happened to show children in compromising situations or positions.68

Watson described how an otherwise innocent search could, if one clicked on a video featuring a child, lead down a rabbit hole of recommended videos linked mainly by the fact that

66 Transcript of Mark Zuckerberg’s Senate Hearing, Washington Post (April 10, 2018), available at https://www.washingtonpost.com/news/the-switch/wp/2018/04/10/transcript-of-mark-zuckerbergs-senate- hearing/?utm_term=.af45e50f8a15 67 IMDB, https://www.imdb.com/title/tt3694654/ (May 10, 2019). 68 Matt Watson, Youtube is Facilitating the Sexual Exploitation of Children, and It’s Being Monetized (2019), YouTube (February 17, 2019), https://www.youtube.com/watch?time_continue=4&v=O13G5A5w5P0 17 they strictly or primarily feature children.69 The comments of these videos were flooded with inappropriate comments about the children, time-code hyperlinks to points in the video where the children were in compromising positions, and in some cases sharing their account detail on other sites so that they could stay in touch. YouTube’s system takes advantage of a type of information filtering recommendation system called association rule mining to assign videos a relatedness score.70 This relatedness score is generated by two broad classes of data: “1) content data, such as the raw video streams and video metadata such as title, description, etc.”; and “2) user activity data” including explicit activities, things like rating or commenting on videos, and implicit activities, data “generated as a result of users watching and interacting with videos”.71 The pedophiles that Watson claimed to have uncovered in his video appear to have been exploiting content data in order to start a chain of videos featuring children and then manipulating user activity data to raise the relatedness score of videos that appealed to them. Also, many of the videos that Watson showed were being monetized by YouTube, meaning that ads for major brands were being placed on videos infested by pedophiles.72

In the days after Watson’s video went viral and began to be covered by news sources outside of YouTube a number of high profile companies, such as Nestle and the Walt Disney Company, announced that they would be suspending advertisement through YouTube either until steps

69 Matt Binder, YouTube’s Pedophilia Problem: More than 400 Channels Deleted as Advertisers Flee Over Child Predators, Mashable (February 22, 2019), https://mashable.com/article/youtube-wakeup-child-exploitation- explained/ 70 James Davidson et al., The YouTube Video Recommendation System, Rec. Sys. 2010 292 (September 26-30, 2010), available at https://www.researchgate.net/profile/Sujoy_Gupta2/publication/221140967_The_YouTube_video_recommendation _system/links/53e834410cf21cc29fdc35d2/The-YouTube-video-recommendation-system.pdf at 294 71 Id. at 294 72 Binder 18 were taken to remedy the problem or, in some cases, indefinitely.73 YouTube itself acted over the next week by disabling comments on tens of millions of videos that it saw as at risk of predatory behavior,74 terminated over 400 channels for comments left on videos, and reported any illegal comments that they found to law enforcement.75

While YouTube’s response was fairly swift that, in itself, raises the issue of why the situation had to be the subject of public outcry before YouTube acted to remedy it. Why would YouTube not restrict its algorithms from recommending content that was tied to each other through certain kinds of content data such as children or situations that could lead to compromising position such as children playing Twister or practicing gymnastics? Much like Facebook’s response to the situation in Myanmar the problem arises not from the fact that the content existed at all, but that it was allowed to proliferate to such an extent before being addressed.

C. Anti-Vaccination Movement

There have always been groups that do not vaccinate. This has usually been for religious reasons, which are given a wide berth by the law. Anti-vaccine beliefs among minority religious groups hasn’t presented that much of a problem for society in general because these groups are frequently isolated and are small enough to be within the margin of error for herd immunity76. In

73 Daisuke Wakabayashi & Sapna Maheshwari, Advertisers Boycott YouTube After Pedophiles Swarm Comments on Videos of Children, The New York Times (February 20, 2019), https://www.nytimes.com/2019/02/20/technology/youtube-pedophiles.html 74 More Updates on Our Actions Related to the Safety of Minors on YouTube, YouTube Creators Blog (February 28, 2019), https://youtube-creators.googleblog.com/2019/02/more-updates-on-our-actions-related-to.html 75 This was communicated via a comment posted by the official YouTube account YouTube Creators. Philip DeFranco (@PhillyD), Twitter (Feb. 20 2019, at 10:12 PM), https://twitter.com/phillyd/status/1098420250352074752?lang=en 76 Herd Immunity is the point at which a population is sufficiently vaccinated to protect those members that cannot be vaccinated for a variety of reasons. For measles this is between 93 and 95 percent. If vaccination rates fall much below this groups that cannot be vaccinated, such as very young children and people who are allergic to vaccines, are at risk of being infected. Sebastian Funk, Critical Immunity Thresholds for Measles Elimination, Centre for the Mathematical Modelling of Infectious Diseases (October 19, 2017), available at https://www.who.int/immunization/sage/meetings/2017/october/2._target_immunity_levels_FUNK.pdf 19 recent years, however, there has been growth in people refusing to vaccinate their children for non-religious reasons.77 That group has taken advantage of the ability of social media to spread information to individuals without having to go through any kind of moderation or validation.

This can be seen especially strongly with the modern anti-vaccination movement because of its origins. It is broadly believed to have originated with a paper by Andrew Wakefield78 that purported to show a link between the MMR vaccine79 and autism. Although his studies were later retracted by scientific journals that published it,80 Andrew Wakefield and his co-authors were struck from the United Kingdom medical register81 after a General Medical Council inquiry, and his study has since been thoroughly debunked,82 it has continued to be passed around as a reason not to vaccinate children. The scientific community has spoken in the strongest terms that are available to it that Wakefield’s study was manipulative, poorly done, and downright fraudulent Wakefield has continued to be taken seriously in the anti-vaccination and autism communities as well as in the broader conversation about the administration of vaccines.

Since he was discredited by the GMC he has been: called to testify before the Oregon Senate

Health Care Committee;83 heralded by one advocacy group as “Nelson Mandela and Jesus Christ

77 A study from 2018 found a drop in vaccination rates in 12 out of 18 states that offer non-medical exemptions. Jacqueline K. Olive et al., The State of the Antivaccine Movement in the United States: A Focused Examination of Nonmedical Exemptions in States and Counties, PLoS Med. (June 12, 2018), available at https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1002578 78 A.J. Wakefield et al., RETRACTED: Ileal-lymphoid-nodular Hyperplasia, Non-specific Colitis, and Pervasive Developmental Disorder in Children, Lancet (February 28, 1998), available at https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(97)11096-0/fulltext 79 Measles, Mumps, and Rubella 80 Retraction: Enterocolitis in Children with Developmental Disorders, American Journal of Gastroenterology (May 2010), available at https://insights.ovid.com/crossref?an=00000434-201005000-00052 81 Fitness to Practice Panel Hearing, General Medical Council (January 28, 2010), available at https://www.nhs.uk/news/2010/01January/Documents/FACTS%20WWSM%20280110%20final%20complete%20c orrected.pdf 82 T.S. Sathyanarayana Rao & Chittaranjan Andrade, The MMR Vaccine and Autism: Sensation, Refutation, Retraction, and Fraud, Indian Journal of Psychiatry v.53(2) (April-June 2011), available at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3136032/ 83 Saerom Yoo, Vaccine Researcher Wakefield to Testify in Oregon, Statesman Journal (February 24, 2015), https://www.statesmanjournal.com/story/news/health/2015/02/24/andrew-wakefield-vaccine-oregon/23967797/ 20 rolled up into one”;84 and directed a film purporting to show a cover-up that almost made its way into the 2016 Tribeca Film Festival.85 While Wakefield’s continued impact on the field in which he was discredited was certainly helped along by his portrayal of himself as the victim of “a ruthless, pragmatic attempt to crush any attempt to investigate valid vaccine safety concerns”86 he has certainly been assisted by the ways that the anti-vaccine movement has spread through social media.

Social media has helped the anti-vaccination movement to circumvent the barrier of geographic proximity that has prevented movements from gaining momentum “allowing individuals from disparate regions who likely would not have otherwise communicated to come into contact”.87 While this effect can be positive in some instances, in anti-vaccination circles a

2016 paper suggests that “a social media-driven event can trigger people’s preexisting disposition to conspiracy thinking and subsequently provoke their attitudinal expressions that are counterproductive to society”.88 The study’s authors stop short of saying that there is a causal link between any particular social media event and a person joining the anti-vaccination movement but are willing to say that “the timeline is highly suggestive” that such a link does exist.89

84 Susan Dominus, The Crash and Burn of an Autism Guru, The New York Times Magazine (April 20, 2011), https://www.nytimes.com/2011/04/24/magazine/mag-24Autism-t.html 85 Vaxxed: Tribeca Festival Withdraws MMR Film, BBC News (March 27, 2016), https://www.bbc.com/news/entertainment-arts-35906470 86 WebMD, MMR Doctor ‘Planned to Make Millions,’ Journal Claims, WebMD (2011), https://www.webmd.com/brain/autism/news/20110111/mmr-doctor-planned-make-millions-journal-claims#3 87 Kumanan Wilson & Jennifer Keelan, Social Media and the Empowering of Opponents of Medical Technologies: The Case of Anti-Vaccinationism, Journal of Medical Internet Research vol. 15 at 2 (2013), available at https://www.jmir.org/2013/5/e103/pdf 88 Tanushree Mitra et al., Understanding Anti-Vaccination Attitudes in Social Media, Proceedings of the Tenth International AAAI Conference on Web and Social Media 269 at 277 (2016), available at https://www.aaai.org/ocs/index.php/ICWSM/ICWSM16/paper/view/13073/12747 89 Wilson at 13 21

The growth of the anti-vaccination community has been accompanied by outbreaks of previously declining diseases, such as measles,90 and has almost certainly led to deaths. While the morbidity rate of measles is quite low in the developed world, approximately one or two deaths per thousand infections in the United States for instance,91 the World Health Organization reported 18,917 cases of measles in the European and American regions in January 2019 alone.92

While it would be going too far to say that all or even a particularly large number of those cases were the result of the anti-vaccination movement, it is still a fact that the number of measles cases is now hitting numbers that have not been seen in decades.93

III. A Way Forward

A. What Are Companies Already Doing?

i. Providing Context

Both Facebook and Google have taken steps to provide context on posts that might be the

subject of misinformation campaigns. When articles pop up on users News Feed there is a

button with a lowercase “i” that, when clicked, take users to the publisher’s Wikipedia

page.94 A similar system has been implemented by Google on its YouTube platform. In

90 Julia Belluz, Measles is Back Because States Give Parents too Many Ways to Avoid Vaccines, Vox (April 18, 2019), https://www.vox.com/science-and-health/2019/2/16/18223764/measles-outbreak-2019-vaccines-anti-vax 91 Complications of Measles, Centers for Disease Control and Prevention, https://www.cdc.gov/measles/about/complications.html 92 Immunization, Vaccines and Biologicals: Measles and Rubella Surveillance Data, World Health Organization, https://www.who.int/immunization/monitoring_surveillance/burden/vpd/surveillance_type/active/measles_monthlyd ata/en/ 93 Lena H. Sun, U.S. Officials Say Measles Cases Hit 25-year Record High, The Washington Post (April 29, 2019), https://www.washingtonpost.com/health/2019/04/29/us-officials-say-measles-cases-hit-year- record/?utm_term=.276f9594b830 94 Kerry Flynn, Facebook Outsources Its Fake News Problem to Wikipedia – and an Army of Human Moderators, Mashable (October 5, 2017), https://mashable.com/2017/10/05/facebook-wikipedia-context-articles-news-feed/ 22

YouTube’s case it is “adding ‘authoritative’ context to search results about conspiracy-prone

topics like the moon landing and the Oklahoma City bombing” through brief summaries from

sites like Wikipedia and Encyclopedia Britannica.95 There has been some concern, especially

around YouTube’s implementation, that the initiatives will place extra burden on

Wikipedia.96 There was also an incident when the Notre Dame Cathedral caught fire in which

YouTube mistakenly emblazoned many of the live news feeds covering the fire with a

context box about the September 11th terrorist attacks.97 Aside from a few hiccups these

programs have been largely well-received.

ii. Facebook’s Ambitious Plan to Consult with Advocates

Facebook released a draft charter for an oversight board for content decision ins January

2019 that outlines a plan to create “a body of independent experts who will review

Facebook’s most challenging content decisions”.98 It has stated that the oversight board

would hear appeals of Facebook’s content decisions and that its decision would be binding.99

This board would be the first of its kind and has received tentative approval from a variety of

legal experts and academics pending more information about how the board would be funded

and how independent the board would be. It is a step in the right direction and my only

addition would be that Facebook should strive to include as many of the other social media

95 Adi Robertson, YouTube is Fighting Conspiracy Theories with ‘Authoritative’ Context and Outside Links, The Verge (July 9, 2018), https://www.theverge.com/2018/7/9/17550954/youtube-google-news-initiative-fake-news- conspiracy-theory-context-updates 96 Id. 97 Chris Welch, YouTube Shows 9/11 Link on Live Videos of Unrelated Notre Dame Fire, The Verge (April 15, 2019), https://www.theverge.com/2019/4/15/18311727/notre-dame-fire-youtube-september-9-11-attack-fact-check 98 Facebook, Draft Charter: An Oversight Board for Content Decisions, Facebook, https://fbnewsroomus.files.wordpress.com/2019/01/draft-charter-oversight-board-for-content-decisions-2.pdf 99 Mark F. Walsh, Facebook Plans to Create a Judicial-Like Body to Address Controversial Speech, ABA Journal (May 1, 2019), http://www.abajournal.com/magazine/article/facebook-judicial-review-controversial-speech 23

platforms as possible to prevent harmful content from merely skipping over to one of the

other social media giants.

B. The MPAA and ESRB Rating Systems

The MPAA and ESRB are self-regulatory bodies that were set up by their respective

industries in response to public outcry much like that which is currently facing social media

companies. Some of the largest social media companies have been called to account for their

actions by a variety of countries, most notably the United States,100 the United Kingdom,101

and India.102 This is eerily similar to the circumstances that led to the creation of the MPAA

and ESRB. The anger is more about preventing radicalization instead of the moral outrage

that spawned the MPAA and ESRB, but the signs are similar.

i. The Creation of the MPAA Rating System103

The MPAA rating system was implemented in 1968 to replace the archaic and lackadaisically enforced Hay’s Code. It was put in place to prevent the government from feeling compelled to step in because of lax enforcement and the ensuing increase in the use of profanity in films. The system is administered by the independent Classification & Ratings Administration

(CARA) and adhered to by the National Association of Theater Owners (NATO). While it is purely voluntary, the vast majority of filmmakers submit their films to CARA in order to reach

100 Zuckerberg Senate Hearing 101 Natasha Lomas, Zuckerberg Gets Joint Summons From UK and Canadian Parliaments, Techcrunch, https://techcrunch.com/2018/10/31/mark-zuckerberg-gets-joint-summons-from-uk-and-canadian-parliaments/ 102 Saheli Roy Choudhurry, Twitter Summoned Before Indian Parliamentary Panel on ‘Safeguarding’ Social Media Users, CNBC (February 6, 2019), https://www.cnbc.com/2019/02/06/twitter-called-before-indian-parliamentary- panel.html 103 Michelle Donnelly, The History of MPAA Ratings, The Script Lab (January 18, 2015), https://thescriptlab.com/features/main/3120-the-history-of-mpaa-ratings/ 24 the widest audience possible. Most theaters are members of the NATO and can be punished for showing films that have not been rated.

ii. The Formation of the ESRB

The formation of the ESRB followed a great deal more controversy including a series of joint

Congressional hearings and a proposed government regulatory regime of the video game industry.104 The furor was prompted by the use of a technique called full-motion video, which digitized real actors, especially its use in Mortal Kombat and Night Trap.105 Mortal Kombat was denounced for featuring violent scenes and Night Trap was berated for including content that some critics classified as “sexually explicit”.106 The hearings ran from 1992 into 1993 and culminated in the introduction of the Video Game Ratings Act of 1994,107 which called for a government ratings board to be set up to administer ratings for video games. The bill died in committee because of an agreement by the major video game makers of the day to create a voluntary organization called the Interactive Digital Software Association, which would later change its name to the Entertainment Software Association, and the independent Entertainment

Software Ratings Board.108 Controversies about sexual or violent content in video games still occasionally arise today but have not resulted in anything like the hearings of 1992-93.

104 Chris Kohler, Jul7 29, 1994: Videogame Makers Propose Ratings Board to Congress, Wired (July 28, 2009), https://www.wired.com/2009/07/dayintech-0729/ 105 Id. 106 Andy Chalk, Inappropriate Content: A Brief History of Videogame Ratings and the ESRB, The Escapist (July 20, 2017), https://v1.escapistmagazine.com/articles/view/video-games/columns/the-needles/1300-Inappropriate- Content-A-Brief-History-of-Videogame-Ratings-and-t 107 Video Game Rating Act of 1994, S. 1823, 103rd Cong. 108 Chalk 25

C. What Should Be Done

The most appropriate path forward would be the creation of an industry organized centralized body to which groups could bring their grievances about content or the removal of content. The body should be organized by social media companies and their advertisers in order to avoid free speech concerns. It should mainly consist of a pseudo-judiciary panel made up of industry experts, human rights advocates, and at least one mental health professional specializing in the effect of filter bubbles. The panel could deliberate on grievances brought before it and issue recommendations on how to address problematic content based on its severity and probability of physical or emotional harm.

By allowing users to bring grievances about both currently existing content and content that has been removed from the platform already it could serve to find content that is problematic, remove that content, and allow for a review of content that was removed for other reasons. This could help to address the problem that has emerged in social media when content is removed for violating a ban on, for instance, violent content even though it might be the only record of violent events. This would necessitate a system for saving content that is completely removed from social media, though, for legal reasons, such a system should not extend to content that is illegal to possess. This exception is mainly intended for child pornography or other types of content that are illegal merely to own rather than to distribute or create.109

109 18 USC § 2252, for instance, makes it illegal for any person to “knowingly [transport] … any visual depiction, if … the producing of such visual depiction involves the use of a minor engaging in sexually explicit conduct”. The intent of saving the images for posterity doesn’t apply to child pornography as strongly as 18 USC § 2252 also has an affirmative defense if the defendant “promptly and in good faith, and without retaining or allowing any person, other than a law enforcement agency, to access and visual depiction or copy thereof – took reasonable steps to destroy each visual depiction; or reported the matter to a law enforcement agency and afforded that agency access to each such visual depiction.”

Once a piece of content or a user has been brought to the attention of the board, there would be a period of public comment during which all concerned parties could bring their concerns to the board. A period of public comment, properly publicized, would allow the board to set a precedent for certain types of cases, for instance cases involving hate speech, that it could then draw on to solve similar incidents. This system is drawn from the notice and comment system that exists for government agencies as well as amicus curiae briefs. The federal government requires agencies to “publish a notice of a proposed rule in the Federal Register” which is publicly available, both in print and online.110 They then must take comments on those proposed rules and consider all relevant matters raised during the comment periods. This is to ensure that all groups which have an interest in a particular rule are able to make their interests known before any rules take effect.

Such a group would work best if, like the MPAA and ESRB, membership was voluntary. The number of groups opting out of membership in the organization could be moderated in much the same way that the MPAA and ESRB exert pressure on groups that do not submit films or video games for rating, through economic pressure. This would require the participation of online advertisers, but that does not seem like it would be a significant impediment. Advertisers have been forced to weigh in on a variety of topics that they probably would have rather not become involved in and at this point would probably relish a neutral third-party to be able to assign all blame to. While this would almost certainly not convince all social media companies to join, as some are almost belligerent in their protection of user’s speech,111 that should not be the goal of

110 Notice and Comment, Justia, https://www.justia.com/administrative-law/rulemaking-writing-agency- regulations/notice-and-comment/ 111Jane Coaston, Gab, the Social Media Platform Favored by the Alleged Pittsburgh Shooter, Explained, Vox (October 29, 2018), https://www.vox.com/policy-and-politics/2018/10/29/18033006/gab-social-media-anti- semitism-neo-nazis-twitter-facebook 27 the system. There is no chance of pushing all problematic content off the Internet. Instead the goal should be to push problematic content out of the mainstream. The goal of such a group should not be to push every authoritarian or huckster out of society altogether, merely to prevent them from taking advantage of the broad reach of the major social media platforms.

Social media companies should act as quickly as possible to get ahead of the increasing public outcry, before it is caught up in something like the furor that forced the RIAA to place parental advisory stickers on a huge variety of music.