October 2019 Rising Through the Ranks

How Algorithms Rank and Curate Content in Search Results and on News Feeds

Spandana Singh

Last edited on October 21, 2019 at 9:47 a.m. EDT Acknowledgments

In addition to the many stakeholders across civil society and the industry that have taken the time to talk to us about and news feed ranking over the past few months, we would particularly like to thank Dr. Nathalie Maréchal from Ranking Digital Rights for her help in drafting this report. We would also like to thank Craig Newmark Philanthropies for its generous support of our work in this area.

newamerica.org/oti/reports/rising-through-ranks/ 2 About the Author(s)

Spandana Singh is a policy program associate in New America's Open Technology Institute.

About New America

We are dedicated to renewing America by continuing the quest to realize our nation’s highest ideals, honestly confronting the challenges caused by rapid technological and social change, and seizing the opportunities those changes create.

About Open Technology Institute

OTI works at the intersection of technology and policy to ensure that every community has equitable access to digital technology and its benefits. We promote universal access to communications technologies that are both open and secure, using a multidisciplinary approach that brings together advocates, researchers, organizers, and innovators.

newamerica.org/oti/reports/rising-through-ranks/ 3 Contents

Introduction 5

Search Ranking 8

Case Study: 8

Case Study: Bing 20

Case Study: DuckDuckGo 24

News Feed Ranking 27

Case Study: 27

Case Study: 33

Case Study: Reddit 37

Promoting Fairness, Accountability, and Transparency Around Algorithmic Curation and Ranking Practices 42

newamerica.org/oti/reports/rising-through-ranks/ 4 Introduction

Since the early 2000s, the amount of information on the internet has grown tremendously. Whether it be news outlets, social media, or e-commerce platforms, the online ecosystem has become a go-to destination for users seeking a variety of information and experiences. At the same time, users have faced a fundamental challenge in identifying credible sources and understanding which of them to use. In order to help users access high-quality, relevant, and accurate information and content, a number of internet platforms rely on proprietary algorithmic tools to curate and rank content for users.

This report focuses on search engines and platforms that ofer news feeds, both of which deploy algorithms to identify and curate content for users. Many of these platforms use hundreds of signals to inform these ranking algorithms and deliver users personalized search and news feed experiences.

These algorithms control both the inputs and outputs of the information environment. They evaluate and process incoming information to identify what content is most relevant for users. They then determine which of these outputs a user should see and rank the outputs in a hierarchical manner. In this way, these platforms act as gatekeepers of online speech by exercising signifcant editorial judgment over information fows.1

Most internet platforms have heralded the introduction of personalized search results and news feeds as a positive—and now integral—feature of their services. Many platforms assert that enables users to access and engage with content that is more relevant and meaningful to them. Personalization features also enable platforms to achieve signifcant growth and boost revenue through avenues such as .

However, there is a fundamental lack of transparency around how algorithmic decision-making around curation and ranking takes place. Because these practices can have a variety of negative consequences, this is concerning. In fact, many users are often unaware that such algorithms are being used by platforms to shape their online experiences.2 Rather, these users believe that the subjective frame presented by curation and ranking algorithms is representative of reality. As a result, users have grown accustomed to outsourcing judgment, autonomy,3 and decision-making to internet platforms and their opaque algorithms, who decide—based on their perceptions of what user interests and values are—what users’ online experience should be.4 This disparity in algorithmic awareness and understanding is fostering a new digital divide between individuals who are aware of and understand the impacts of algorithmic decision-making, and those who do not.5

newamerica.org/oti/reports/rising-through-ranks/ 5 As algorithmic tools are increasingly used to curate and rank content on internet platforms, concerns around fairness, accountability, and transparency have grown. In particular, an increasing number of researchers have noted that users lack awareness of algorithmic decision-making practices. In addition, there is a signifcant lack of transparency from internet platforms regarding how these tools are developed and deployed, and how they shape the user experience. Further, researchers have outlined that users and content creators often lack meaningful controls over, and agency related to, algorithmic decision-making practices.

In addition, although these algorithmic curation and ranking tools remove the need for humans to make signifcant manual and individual decisions regarding millions of pieces of content, they do not remove the need for human editorial judgment in this process, nor do they reduce bias. The term “bias” in this context does not solely refer to inappropriate preferences based on protected categories like race or political afliation. Rather, these tools compile insights into a broad range of weighted signals. Ranking algorithms then analyze the data to prioritize certain forms of content and certain voices over others. These algorithms also incorporate the judgments of the engineers who have developed them, particularly with regard to what information users are likely to fnd interesting and meaningful. In addition, algorithms can infer correlations in data that may refect societal biases. Indeed, often times, these algorithms result from machine learning “black box” systems. This means that even though developers may know what the inputs and outputs of an algorithm are, they may not know exactly how the algorithm operates internally. Therefore, concerns regarding algorithmic bias and accountability have grown as these algorithmic decision-making practices have become more prevalent.

This report is the second in a series of four reports that will explore how automated tools are being used by major technology companies to shape the content we see and engage with online, and how internet platforms, policymakers, and researchers can promote greater fairness, accountability, and transparency around these algorithmic decision-making practices. This report focuses on the algorithmic curation and ranking of content in search engine results and in news feeds on internet platforms. It uses case studies on three search engines—Google, Bing, and DuckDuckGo—and on three internet platforms that feature news feeds—Facebook, Twitter, and Reddit—to highlight the diferent ways algorithmic tools can be deployed by technology companies to curate and rank content, and the challenges associated with these practices.

Editorial disclosure: This report discusses policies by Google, Microsoft, and Facebook, all of which are funders of work at New America but did not contribute funds directly to the research or writing of this report. New America is guided by the principles of full transparency, independence, and accessibility in all its activities and partnerships. New America does not engage in research or educational activities directed or

newamerica.org/oti/reports/rising-through-ranks/ 6 infuenced in any way by fnancial supporters. View our full list of donors at www.newamerica.org/our-funding.

newamerica.org/oti/reports/rising-through-ranks/ 7 Search Ranking

Search engines have emerged as an essential tool as the internet has expanded, supporting key aspects of free expression online through information access and information dissemination. Search engines enable users to more efectively sift through and access endless amounts of information, and they have empowered individuals, businesses, and publishers to disseminate information.

Although the process of conducting a search on a search engine seems straightforward, the ways algorithms curate and rank search results raise a number of concerns. These algorithms underpin the immense power of search engines, which are able to determine what information a user sees, the type of results they can access, and which publishers and pieces of information a user engages with frst. In this way, search engines play a signifcant role in shaping every one of their user’s perspectives and mindsets.6 Additionally, not all search engines operate in the same manner. Therefore, which search engine a person uses will also infuence their viewpoints and opinions.7 As Wired’s Brian Barrett wrote, “The internet is a window on the world; a search engine warps and tints it.”8

The topic of algorithmic curation and ranking of search results has become especially prominent in the news recently, as conservative politicians in the United States have claimed that search engines, such as Google, and internet platforms, such as Facebook and Twitter, have instituted a liberal bias within their search results and news feed curation practices.9 However, there is little evidence that such bias is actually present.10

Technology companies that operate search engines can utilize artifcial intelligence and machine learning in a number of ways.11 These include powering speech to text searches and enabling visual searches. This section of this report, however, will focus on how these algorithmic tools are used to curate and rank search results. It will use three case studies—Google, Bing, and DuckDuckGo—to explore how diferent companies have structured and implemented the practices of curating and ranking their search results, and what challenges they have faced in the process.

Case Study: Google

Google is the world’s largest search engine.12 It was founded in 1998 by Larry Page and Sergey Brin, and today operates under parent company Alphabet Inc. Over the past two decades, Google’s infuence in the technology sector, and in society broadly, has grown signifcantly. As of July 2019, Google had 92.19 percent of the global search engine market share.13 It also currently ranks frst for

newamerica.org/oti/reports/rising-through-ranks/ 8 global internet engagement on Alexa rankings.14 Google’s search engine product, known as Search, is currently available in over 150 languages and over 190 countries.15

Brin and Page’s original vision of the Google search engine was primarily based on the tenets of citation analysis used by scholars, researchers, and scientists. In such research-oriented felds, the more a work is cited, the more it is generally considered legitimate and high-quality. Brin and Page saw value in this approach; by identifying web pages that other web pages frequently linked to, they would be able to identify similarly legitimate, or at least popular, online sources. From this sprang Google’s original search engine ranking algorithm, known as PageRank.16 However, as outlined by Safya Noble, this approach comes with a range of problems. For example, when citing work in a research publication, all citations are given the same weight in the fnal bibliography, despite expected diferences in how much the author relied on each work cited. In addition, in a research publication, all citations are weighted equally, regardless of whether the author mentioned a work and its contents to validate it, reject it, and so on. Brin and Page predicted some of these complications, and the ranking algorithm has evolved considerably since its original conception to account for further limitations.17

According to Google, its Search product aims to enhance a user’s search experience by providing them with personalized search results. These search results are customized based on personal user data collected by Google, such as browsing and purchase history, and are generated using a combination of manual and algorithmic tools. This personalized search feature also, however, serves as a major source of advertising revenue for the company.

Over the past few years, Google has come under signifcant criticism for its personalized Search feature. In particular, many researchers and internet activists have expressed concerns that by delivering search results that users are likely to click on and be interested in based on their prior searches and personal data, Google is creating a “flter bubble.”18

Recently, Google has stated that it is moving away from ofering personalized search results.19 However, today users still receive personalized search results on the platform.

Underpinning Google’s search engine is a process called crawling. Crawling deploys software, known as web crawlers, to identify publicly available web pages. Web crawlers determine which websites to browse, how often to browse them, and how many pages should be browsed from each website. Typically, web crawlers select web pages to crawl based on previous crawls and sitemaps.20 Once the web crawlers have identifed a set of web pages, they visit them and utilize links on these web pages to identify other web pages. During this process, the web crawlers specifcally look to identify new websites, changes that have

newamerica.org/oti/reports/rising-through-ranks/ 9 occurred to existing websites, and dead links. These web crawlers then bring data about these web pages back to Google’s servers. When a crawler identifes a web page, Google’s system renders the content of the page, like a browser does, and works to identify signals such as keywords and website freshness (the recency of a website’s content). These signals are continuously monitored using the Google Search index. The Google Search index contains hundreds of billions of web pages and it includes an entry for every word seen on every web page that has ever been indexed. When a new web page is indexed, it is added to the entries for all the words it contains.21 When a user enters a query into Search, the Search algorithm identifes keywords in the query and matches it against web pages in the index. It then ranks the subsequent search results based on over 200 diferent signals.22 The search result ranking process is, for the most part, conducted using algorithms. According to Google, the company does not deploy human curation when ranking its search results.23

Some of the factors and signals that infuence the Search ranking algorithm are:

1. The meaning of a user’s query: In order to provide a user with a relevant result for their query, Google frst needs to identify the intent behind this query. To do this, the company has developed language models, including natural language processing models, which are adept at understanding which particular keywords should be looked up in the index. Additionally, Google has developed a synonym system, which enables for search results to include results related to synonyms of keywords in the original query—for example, if the query included blanket, then the search results may also provide links to websites related to duvets as well. The introduction of this synonym system has enhanced Google search results in over 30 percent of searches across languages. Search algorithms also attempt to decipher what category of information a user is looking for, such as whether it is a specifc or broad search, or a language-specifc search. If a user searches for a trending keyword or topic, such as the results of the latest UFC fght card, Google’s freshness algorithm will interpret this as a signal that recent information may be more useful than older results.24

2. Relevance of web pages: After assessing the meaning of a user’s query, algorithms begin to evaluate the content of diferent web pages in order to understand whether a page contains information that is relevant to the initial query. One of the clearest indicators that a web page may be relevant is if the headings or body of the page contain the same keywords as the ones in the search query. In addition to keyword matching, Google asserts that it uses aggregated and anonymized interaction data in order to determine whether the search results are relevant to the initial query. This data informs signals that enable Google’s machine-learning systems to better determine relevance. These relevance signals help the Search

newamerica.org/oti/reports/rising-through-ranks/ 10 algorithm determine whether a web page contains information that answers the initial search query, or whether it simply repeats the same question posed in the query. For example, if a user’s query was for “books”, the algorithm would determine whether a web page contains relevant content aside from the keyword “books,” such as pictures, videos, lists, reviews, and so on. According to Google, although these search systems are constructed to seek out quantifable signals in order to determine relevance, they are not structured to assess subjective notions such as the political ideology of a page’s content.25

3. Quality of content: Google’s Search algorithms also aim to prioritize the most reliable and high-quality search results. In order to determine reliability and quality of content, Google’s systems identify signals that can help assess expertise, authoritativeness, and trustworthiness on a given topic. They also search for web pages that many users appear to value for similar search queries. For example, if other reliable and prominent websites link to a web page, that is considered a good indicator that the content of that web page is reliable. This is known as PageRank. PageRank is a mathematical formula that can be used to judge the “value of a page” on the web by assessing the quantity and quality of other pages that link to it. PageRank was one of the fundamental components of the original Google Search algorithm and it was inspired by the system scientists used to gauge the importance of scientifc papers, which was to evaluate how many other scientifc papers referenced or cited them.26 However, after Google deployed a public PageRank score for each web page, bad actors began working to game this system, which led to a large volume of link spamming.27 As a result, the public PageRank scoring was retired, although PageRank remains a component of the Search algorithm. Additionally, as other factors became increasingly more important for the Search algorithm and for ranking, other signals were incorporated and adopted into the algorithm.

Google also uses aggregated feedback from Google’s Search quality evaluation process to further refne its ability to assess information quality. Further, Google uses spam algorithms in order to determine the quality of a page and ensure that low-quality, harmful or manipulative web pages are not ranked highly in search results.28

4. Usability of web pages: When ranking search results, Google Search assesses whether web pages are easy to use and ranks those that are deemed more user-friendly higher than those that are not. Some of the signals that inform whether a web page is usable include whether the website is formatted properly for multiple browsers, whether it has been formatted for various devices and sizes (such as desktops, tablets, and smartphones), and whether the web page can be loaded by users with

newamerica.org/oti/reports/rising-through-ranks/ 11 slower internet connections.29 This can make the Google Search product more accessible to those using the service through diferent devices or in diferent regions. However, it may also render some otherwise high- quality sites inaccessible to these users if web page owners do not abide by Google’s user-friendly requirements, thus stifing information fows.

5. Context and settings: As previously mentioned, Google extracts insights from users’ personal data in order to inform and tailor search results. These data points include location, purchase history (as determined by crawling users’ Gmail inboxes for purchase receipts30 ), past Search history, and Search settings. According to Google, a user’s Search settings on factors such as preferred language or SafeSearch—a tool that enables users to flter out explicit or sensitive content—also enable Google to understand which results are likely to be the most useful for them. Google has also stated that Search results may be personalized based on a user’s Google account activity. For example, if a user searches for “events near me,” the search engine may personalize recommendations based on events it thinks the user may be interested in. According to the company, these inferences are made so that search results can match a user’s interests. The company asserts that they are not, however, designed to infer a user’s race, religion, political afliations, or other sensitive characteristics.31

The weight applied to each factor depends on the nature of the query. For example, the freshness algorithm may play a more prominent role for queries related to current events.32 In addition, Google has detailed specifcations for how pages that may impact the “future happiness, health, fnancial stability, or safety of users” are weighted.33 These web pages are known as “Your Money or Your Life” pages (YMYL) and they include websites that let users make purchases or pay bills, ofer fnancial, medical or legal information, or produce news. Google has especially high page-quality rating standards for these pages, as low quality content on such pages could have a signifcant negative impact on users.34 As a result, when Search algorithms detect that a query is related to a YMYL subject, it places more weight in the ranking system on factors such as authoritativeness, expertise, and trustworthiness.35

As the largest search engine in the world, Google is responsible for curating and delivering a large amount of information to users. It therefore assumes some responsibility for the impact of its algorithms in shaping the worldviews of users. The platform, however, has come under signifcant criticism for not providing enough transparency and accountability around its search curation and ranking practices, particularly around how it personalizes search results.

According to many search engines, a personalized search experience can help users flter through the vast amount of information available on the internet and

newamerica.org/oti/reports/rising-through-ranks/ 12 access results that are relevant and useful. Personalized search experiences also have helped platforms like Google achieve signifcant growth and boost revenue. Personalized search results were a key advantage of Google when it frst released, and they became an integral feature of Google Search. In 2009, the company even made personalized search the default option for all users who were logged in to their Google accounts. Also in 2009, the company stated it would deploy an anonymous cookie in order to provide personalized search results to users that were logged out or didn’t have a Google account. This cookie operated separately from a user’s Google Account and web history, which were only available to users who signed in.36 In a blog post, the company also stated they used contextual signals, such as those that aim to reduce the ambiguity of queries, in order to rank search engine results, even for users who were logged out. In this case, this means that Google would use information from a user’s recent search to clarify their current search. The platform said that this would not result in signifcantly diferent search results.37 However, its deployment did raise a number of privacy concerns regarding how Google was collecting and using user information across its diferent Search modes.

Over the past few years, the platform has asserted that it has moved away from personalizing search results. In September 2018, for example, CNBC reported meeting with Google’s algorithm team and learning that because each search query requires a signifcant amount of context, the opportunities for personalization are quite limited.38 Additionally, the company has suggested that personalization does not signifcantly improve the quality of the Search experience, and as a result they have moved away from it.39 However, it is unclear whether these shifts are due to increased scrutiny on the practice of ofering personalized search results or if the company is purposefully changing the way their Search algorithm works.

Many critics believe Google continues to curate and personalize search results to a large extent.40 One of Search’s biggest critics is privacy-focused search engine DuckDuckGo. In June 2018, DuckDuckGo conducted a study in which 87 volunteers in the United States conducted searches on Google’s private browsing mode (known as “Incognito”) while logged out and on the regular Search platform. The volunteers searched for three politically charged topics: “gun control,” “immigration,” and “vaccinations.”41 The study revealed that most participants received unique results, some saw search results that others did not, 42 and some received fewer domain results than others.43 Additionally, the ranking of these search results also varied.44 The researchers found more than double the variation in the search results when comparing Incognito searches to regular searches.45 There was also signifcant variation between news and video results.46 This suggested that even if users were conducting searches in Incognito mode (which aims to provide users with a private browsing option that does not save a user’s search history), and while logged out, they still received personalized search results. This is likely because websites can use IP addresses

newamerica.org/oti/reports/rising-through-ranks/ 13 and browser fngerprinting in order to identify users even when they are searching in these modes.47 According to the researchers, had the results been truly non-personalized, then all users would have received the same search results.48

DuckDuckGo’s researchers worked to control factors that could have infuenced these results, such as location, time, and being logged into Google, by having the volunteers conduct searches while logged out, and by having them conduct the searches at the same time and on the same day. In addition, they controlled for potential variances in location, which might have resulted in, for example, local news stories appearing in the search results, by reviewing all links by hand and comparing them to the city and state of the volunteer who viewed them.49

Google responded to the research by calling it fawed,50 stating that DuckDuckGo’s attempts to control for time and location diferences had been inefective. They also claimed that the researchers assumed that any diference in the search results automatically suggested personalization.51 Further, they highlighted that search results related to news and current events were likely to continuously change depending on daily occurrences. They also stated that personalization was performed on a small portion of overall queries, primarily related to location52 and utilizing previous searches in order to decipher context for a current search.53 Google has also suggested that any variation in search results is likely due to factors, such as a user’s location, the language the search is performed in, and the distribution of Search index updates throughout Google’s data centers.54 Despite these rebuttals, however, there are still signifcant concerns around Google’s eforts to personalize search results, and the lack of transparency and accountability around these practices.

In particular, internet activists, such as , have asserted that personalized search results can create and maintain flter bubbles and promote certain biases or world views. In his book, The : What the Internet is Hiding From You, Pariser argues that by providing users with content a system predicts they will like, internet platforms are fltering out and therefore preventing users from accessing information that may challenge their perspectives or broaden their horizons. This places these users in a “flter bubble” and amplifes their confrmation bias. It can also result in users becoming ill-informed, developing perceptions that are skewed towards one perspective, and even developing a distaste for ideas that are unfamiliar or contrary to their own. This is particularly concerning in the context of political discourse and rising political polarization around the globe.55 This has become an increasingly prominent topic since the 2016 U.S. presidential election, in which numerous internet platforms, including Google, were accused of creating flter bubbles or echo chambers through their algorithmic content curation practices.

However, some researchers push back against the notion that an online flter bubble has enhanced polarization. For example, economists from Brown

newamerica.org/oti/reports/rising-through-ranks/ 14 University and Stanford University studied the relationship between polarization and the use of online media in American adults between 1996 and 2012. They found that polarization has largely been driven by those Americans who spend the least amount of time online, such as those over the age of 75. According to the study, those belonging to younger demographics who use the internet more frequently, demonstrated little diference in their level of polarization in 2012 than they had in 1996, when online platforms were far less prevalent and infuential.56

Additionally, some conservative lawmakers in the United States allege that Google has demonstrated bias in how it curates and ranks its search engine results. According to this critique, this bias prioritizes and preferences liberal information sources, in particular ones that are critical of conservatives. In December 2018, Google CEO Sundar Pichai testifed before the U.S. House Judiciary Committee on the topic of alleged bias against conservatives on the platform, denying that such bias exists.57 This conversation has been particularly prominent in the political sphere as internet platforms have ramped up their eforts to remove misinformation and disinformation on their services. In the process, a number of politically charged web pages and pieces of content have also been impacted, contributing to claims of conservative bias.58

However, as previously discussed, there is little evidence to support the notion that such political biases exists in Google’s search results. Google has stated that Search is designed to decipher the usefulness and relevance of a web page, not to promote the political and ideological viewpoints of the individuals who built or audited the system.59 The company has also stated that it does not utilize human curation when ranking search results, and rather relies exclusively on algorithms in order to debunk some of these claims. According to Danny Sullivan, Google’s public liaison for Search, Google does not manually intervene on specifc search results when addressing issues with ranking. This is because tweaking one search result or addressing one query when the search engine receives trillions per day does not have a strong impact on the overall Search experience.60 Additionally, the platform has asserted that its systems are not designed to make subjective determinations about truthfulness on web pages. Rather, it uses a range of measurable signals—such as the PageRank signal, which is used to determine authoritativeness61 —to assess how users and other web pages perceive the expertise, trustworthiness, and authority of a web page and its content.62 Google’s ranking algorithms then promotes these web pages, particularly during searches where the original query could surface misleading information.63

However, just because Google does not deploy human curation during the search ranking process, it does not mean that no bias is present. Algorithms are not neutral and bias-free. The signals that an algorithm uses are designed to prioritize certain information or qualities over others in order to curate and rank content. This is a key part of how search engines and news feeds today work, and

newamerica.org/oti/reports/rising-through-ranks/ 15 is often seen as integral to their operations. On the other hand, the term “bias” in this context does not solely refer to inappropriate preferences based on protected categories like race or political afliation. Although an algorithm may contain biases based on how it sorts and ranks content, it is difcult to know exactly what these biases are. This is particularly true with black box machine learning systems. Therefore, it is difcult to draw reliable conclusions, such as whether algorithms are biased against a certain political party. Additionally, these algorithms incorporate the judgments, preferences, and priorities of the engineers who developed them, particularly around what information users are likely to fnd interesting and meaningful. Furthermore, as outlined by Dr. Safya Noble, an associate professor at the University of California, Los Angeles, search results can also reinforce and perpetuate societal biases. In her book, Algorithms of Oppression, Noble outlines how in the early 2010’s Google’s search engine results related to women, in particular women of color, were overly sexualized and stereotyped. Noble also highlights a case in 2016, in which Google Images search results for “three black teenagers” delivered mugshots of African- American teenagers, whereas similar search results for “three white teenagers” delivered “wholesome and all-American” results. In this way, search engine algorithms can reinforce existing societal stereotypes, often disproportionately impacting already marginalized communities.64

Although Google has shared some information about the signals that contribute to its search algorithm, it has not provided a comprehensive overview of all the signals, how they interact with one another, and what impact they have on online expression as a whole. One reason many platforms fail to provide comprehensive transparency around these signals is because these signals make up the platform’s algorithmic “secret sauce,'” and they therefore want to keep it confdential in order to maintain a competitive edge. However, even if it is valid to maintain confdentiality for certain operational details, it is important to promote transparency and accountability to the greatest extent feasible, and companies’ claims about trade secrets should not outweigh the public interest.

Additionally, many website owners have expressed frustration over the fact that Google does not always announce when it is updating or making changes to its search algorithm. As a result, after a change, some content creators fnd that their web pages are no longer ranking as well. The ranking of websites in search results is of vital importance to website publishers, as it infuences the success of their websites. Typically, a user only clicks on the top few search results. The remaining search results on the page receive far lower clickthrough rates. As a result, being able to rank high in search results, and understand how various ranking signals impact one’s website and associated information fows, commerce outlets, and so on, is important.65

Part of the reason Google may not make frequent announcements is because it regularly makes changes to its Search algorithms. In 2017 alone, Google performed over 200,000 experiments and subsequently instituted 2,400

newamerica.org/oti/reports/rising-through-ranks/ 16 changes to Search.66 In July 2019, the company announced that over the past year they had made 3,200 changes to Search systems. These included updates for specifc features or elements as well as broad core updates.67 According to Google, when making changes to its Search systems, the platform identifes areas of improvement, develops a solution, and then tests that solution. After deciphering whether the solution is feasible and improves the Search experience, they then implement it. These algorithmic changes apply to a broad range of similar searches.68 Thus far, Google has only provided public updates around some broad core changes they have instituted. The company has stated that they aim to provide site owners with prior notice of “signifcant, actionable changes to our Search algorithms,”69 but in many instances this has proven to not be enough.

Greater transparency would enable publishers of web pages to better understand how their content is curated and ranked, and help them ensure they can adequately distribute their content. Currently, there are a number of search engine optimization (SEO) organizations and communities that have sprung up that speculate on algorithmic changes. However, without clear direction and information, the ability of these communities to understand how they can efectively exercise their free speech online is limited, and the ability of users to access content is therefore also limited. In addition, greater transparency would also help Google disprove or debunk growing claims of political bias.

Although Google’s search curation and ranking process is conducted primarily using algorithms, humans still play a role in this process. According to Google, the platform does not remove or delist search results. Rather, it seeks to promote higher quality content in the rankings over lower quality search results. The company has asserted that in a few rare exceptions they intervene manually in order to remove or delist content from search results. These cases include when the platform receives legal requests to remove or delist search results, when a web page violates Google’s webmaster guidelines, and when a webmaster of a page requests that their web page be removed or delisted.70 Although Google states that they do not frequently engage in these practices, government and legal eforts to remove content have increased globally.71

Google provides transparency and accountability around its search result curation practices through its transparency report, which is published twice a year. In this report, Google reports on government requests to remove content across all its products, content delistings due to copyright, and requests to delist content under European privacy law (known as the “right to be forgotten”). Its data on government requests to remove content can be broken down by product, enabling for a greater understanding of how such requests impact the Search product in particular. The report includes metrics such as removal requests by the numbers, items specifed by the requests, and a percentage breakdown of which products were afected. It also includes reasons for requests.72 This is a

newamerica.org/oti/reports/rising-through-ranks/ 17 best practice that other platforms should adopt, as it provides transparency around how external parties are infuencing search results, and how Google is responding. Google has also asserted that where possible it aims to inform website owners about requests for removal through its Webmaster Console. This is a vital portion of accountability as well.73

Additionally, some research indicates that Google has intervened to manually alter search results when the results sparked controversy. For example, in December 2016 British Investigative Journalist Carole Cadwalladr wrote about how the top search result for “did the Holocaust actually happen?” in Google Search was a white nationalist web page which denied that the Holocaust had ever happened. Cadwalladr’s fnding sparked outrage and eventually the results changed. Although Google has stated that it prefers “to take a scalable algorithmic approach to fx problems” rather than “fx the results of an individual query by hand,” many suspect that the company took corrective action nonetheless. However, others have suggested that the online outrage sparked by Cadwalladr’s article also contributed to this change in the results, as it drove trafc to certain web pages on this topic, thus impacting the search results and the ranking of these results.74

Another way that humans play a role in Google’s search curation and ranking process is through the Search Quality Rating process. In order to ensure its search algorithms promote relevant and high-quality content, Google developed a rigorous testing process that involves live tests and review by thousands of trained external search quality raters around the world. This process is deployed every time Google considers implementing a change or update to its search algorithm. In order to roll out a change, Google must be able to determine that the change provides a net positive. This means that a signifcant amount of search results will be made more helpful without subsequently creating major losses in other areas. Making such changes to organic search results can take a large amount of time.75

Search quality raters are external individuals based around the globe who help evaluate whether a website provides users with the content they were looking for. They also help evaluate the quality of search results based on the expertise, authoritativeness, and trustworthiness of the content. Each search quality rater represents a specifc language and geographic expertise or perspective, in order to ensure Google’s Search product is useful around the world. Although these individual’s ratings do not directly impact the ranking of any web page, they enable Google to benchmark the quality of their results and identify areas for improvement.76 This informs Google’s search algorithms, which aim to prioritize high-quality content and web pages.77

In order to ensure that search quality raters are using a consistent approach, Google provides them with Search Quality Rater Guidelines, which outline Google’s goals for its ranking systems and includes examples for appropriate

newamerica.org/oti/reports/rising-through-ranks/ 18 ratings. According to Google, in order to ensure consistency in the rating program globally, all search quality raters are required to pass a comprehensive exam and are continuously audited. Evaluators also assess each improvement to Search that is rolled out through side-by-side experiments in which evaluators see two diferent sets of search results, one with the change and one without. Evaluators must then identify which experience is of greater relevance and quality. This feedback is used to improve Search and launch eforts.78 These search quality rater guidelines are publicly available, therefore providing a degree of transparency around these curation and rating eforts. However, not as much information is known about who these raters are and what perspectives, regions, and cultures they represent. Knowing this would enable researchers and users to get a better sense of how search results are being curated and rated.

Given that Google’s search engine is a vital way for website publishers to disseminate their information and gain traction, many publishers have invested signifcant time and resources into ensuring that they rank well in Google’s search results, a practice known as SEO. Google’s publicly available webmaster guidelines outline how publishers can ensure that Google fnds, indexes, and ranks their website. It also provides guidelines on topics such as quality, as well as rules around prohibited or illicit activity such as spam, malware, and deceptive websites.79

When a website violates Google’s webmaster guidelines, it can be penalized in one of three ways: Google can penalize the website, thus neutralizing the impact of the spam; demote the website; or remove the website from search results completely.80 According to Danny Sullivan, Google’s algorithms can detect the majority of spam and automatically prevent the ranking system from promoting such content by demoting or removing it.81 The remainder of spam results are typically manually addressed by a spam removal team. They review the pages in question, typically based on user feedback, and fag them for penalty if they have been found to violate the webmaster guidelines.82 Manual actions can be used to penalize an entire website, subdomain, sections of a website, or specifc pages. Manual action can also demote websites in search rankings and delist them.83 If a web page owner feels that they have incorrectly or unfairly been penalized, they can submit a reconsideration request.84 However, processing and responding to these requests often takes a signifcant amount of time, and this therefore can undermine the operations and success of a website for an extensive period of time. In order to provide greater transparency and accountability around this process, Google should enable appeals in a more timely manner.

Finally, in order to provide greater transparency and accountability around its search curation and ranking practices, Google needs to provide its users with greater controls around how their data is used and how their search experience is tailored.

newamerica.org/oti/reports/rising-through-ranks/ 19 In 2009, Google made Search the default search option for all users, including users who are not logged into a Google account.85 Users who were signed in, however, could access a tab which outlined how Google had customized their search results, and how they could turn it of.86 Today, a similar tab exists that explains to users how activity data, location data, and data from other Google products and services make Search work. The tab also lets users delete their search activity; choose whether their activity is saved on Google sites, apps, and services; and whether users would like personalized advertisements. Logged-in users therefore do have a range of controls over algorithmic curation available that enable them to disable the personalization of search results based on their account activity to a certain extent. However, as the DuckDuckGo study indicated, search results are often still personalized, even if a user is not logged in or if a user is using Incognito mode.87 All users, regardless of whether or not they are logged in to a Google account or browsing in Incognito mode need to be able to access controls that enable them to opt out of algorithmic content curation and ranking during the search experience. They also need to have strong privacy controls over how their data is tracked, collected, and used. Additionally, similar controls need to be aforded to users who use Search but do not have a Google account, as they do not have access to the suite of settings that Search users with accounts do.

In terms of controls, Google also ofers website owners a series of controls over how their website appears in Search results. They do this through the Webmaster Tools feature, which lets website owners provide granular instructions on how Google crawls and processes pages on their website. Website owners can also request a recrawl or opt out of crawling altogether.88 This enables website owners to control how their content is processed to an extent as well. However, the algorithm eventually determines how well a website ranks and performs for each user.

Case Study: Bing

Bing is an internet search engine that is owned and operated by Microsoft. It launched in June 200989 and enables a variety of search services including web, video, image, and map search.90 When Bing launched, Microsoft sought to position it as more than a simple search service. Rather, the company claimed to provide a product that enabled consumers to rapidly acquire more relevant and informed insights from the web, and to use these insights efectively. Their marketing described this ideal search engine as a “Decision Engine.”91 Initially, Bing focused on four verticals: making a purchase decision, planning a trip, researching a health condition, and fnding a local business.92 As with Google, by providing users with more relevant and informed search results, Microsoft also hoped to boost its growth and revenue.

newamerica.org/oti/reports/rising-through-ranks/ 20 Although Bing accounts for a small percentage of Microsoft’s overall revenue,93 the search engine has grown to be the second largest in the world in terms of market share,94 boasting 1.3 billion unique monthly global visitors95 and ranking 31st for global internet engagement on Alexa rankings.96 Although Google still dominates the search engine industry in terms of market share,97 Bing is often considered the most comparable alternative available to users.98

Bing has its own web crawler, known as Bingbot, which uses an algorithm to determine which websites to crawl, how often to crawl, and how many web pages to gather from each website. The algorithm chooses web pages to crawl by prioritizing relevant known URLs that are not indexed yet, and URLs that are already indexed but that need to be revalidated to check for changes or dead links. Bingbot also seeks to identify new web pages that have not been crawled or indexed yet.99

Like Google, Bing uses a combination of algorithmic signals when ranking search engine results and uses human editors.100 However, Microsoft has not recently made any disclosures about which signals it uses to rank search engine results. The latest major disclosure it made was in 2014, via a blog post on the role of content quality in Bing search results. The blog post outlined that the relevance of a result is a signifcant consideration for the Bing ranking algorithm. The relevance of a result is a function of three things: topical relevance to a user’s query (does the result sufciently address the query?), content quality, and context (is the query related to a recent topic?, where the user is located?, etc.).101 Content quality is based on three primary pillars: authority (can the content be trusted?), utility (is the content useful and detailed?), and presentation (is the content well-formatted, accessible, and easy to fnd?). Authority is determined based on a range of factors including signals from social networking platforms, cited sources, name recognition, and information about the author. In order to assess the utility of a website, Bing’s models aim to predict whether the page’s content provides adequate supporting information, whether it is detailed enough for the intended user, and whether it includes supporting content such as videos, graphs, etc. The models also consider the level of expertise required to produce the content on the web page, with a preference for content that is unique and does not reproduce existing materials.102

Microsoft has also stated that both the signals and human editors account for live search activity and real-time news events when ranking news results.103 In addition, the search engine only uses metrics related to how many clicks a web page gets when it evaluates how a search ranking algorithm is working. Click metrics are not a primary signal that is considered when ranking search engine results.104

According to Microsoft, the Bing search engine was constructed to identify which results most satisfed users. Based on these insights, Microsoft develops

newamerica.org/oti/reports/rising-through-ranks/ 21 guidelines and training datasets. These training datasets are evaluated by search quality raters (known as “judges” for the Bing search engine) who operate in Bing’s Human Relevance System project.105 These judges work to identify which search results are the most satisfying according to factors such as relevance and accuracy. They also use click metrics to evaluate whether users are satisfed with the search results they received.106 In addition, Microsoft pushes out models to subsets of users in order to observe which results most satisfy real users. It follows a similar process when implementing updates to its ranking algorithm.107

The Bing search engine also uses machine learning to scale the process of generalizing. Generalizing is when Bing judges108 manually rank search results using a set of guidelines and is typically most accurate when it is performed by humans. However, because it is conducted manually, it cannot be done at scale. The use of machine learning in this instance therefore aims to provide users with search results comparable to those that judges would deliver, but at scale. This can only be achieved by generalizing the ranking algorithm as much as possible. 109 Today, approximately 90 percent of Bing search results are ranked based on machine learning.110

In addition, like other search engine’s ranking algorithms, Bing’s search ranking algorithms are continuously updated to attempt to provide users with a better experience.111 These updates aim to refne and improve search results and remove spam from search indexes to protect users from negative or manipulative search results. The search engine uses algorithms to demote and penalize websites that violate Microsoft’s guidelines.112 As a result, it observes how users react to search results, how the search engine assesses the search results, and what the actual ranking algorithm returns. This process cannot be perfect and the search engine algorithm therefore has to continuously be updated.113

Bing is the second largest search engine in terms of market share and it is therefore responsible for curating a signifcant amount of content for a large user base. Despite this, it demonstrates a lack of transparency around its search result curation and ranking process.

First, Microsoft does not publicly disclose substantial information around the signals that defne how search engine results are curated and ranked. This makes it difcult for users and website owners to know how the search experience is developed and personalized, and which qualities, results, and voices are prioritized over others. In addition, Microsoft does not regularly share explanations of why its search algorithm is being updated. Given that these changes impact how and if publishers’ content is viewed by users, and given that these changes alter the scope of content users engage with, this is concerning.114 Microsoft does ofer website owners some information on how to use the Bing search engine via its publicly available Webmaster Guidelines. These guidelines outline how providers can work to ensure their content is found and indexed

newamerica.org/oti/reports/rising-through-ranks/ 22 within Bing.115 But, this resource does not ofer live and up-to-date information on recent changes.116

Furthermore, although Microsoft deploys judges to help improve its search experience, it does not publicly share the guidelines these judges use.117 Some organizations, such as online blog Search Engine Land, appeared to be able to obtain copies of these guidelines and have written about them, but Microsoft itself has not publicly disclosed information around these eforts. Microsoft also does not share any information about who these judges are, and what perspectives, regions, and backgrounds they represent. This makes it difcult to understand how search results are benchmarked and curated, and which voices and perspectives are being considered when producing the search experience. In addition, according to a spokesperson, Microsoft deploys both algorithmic tools and human editors during the search curation and ranking process. Microsoft does not, however, provide any further information on the role of these human editors, how they difer or are similar to judges, and what role they play. The presence of human editing in this process creates the very real opportunity to instill bias in the search ranking process. Although, as previously noted, the term “bias” in this context does not solely refer to inappropriate preferences based on protected categories like race or political afliation, this still raises concerns. Algorithmic biases, which can originate from their creators and from biased training data, are also a signifcant concern. However, given that Microsoft shares limited information about its algorithmic curation and ranking process, it is difcult to assess what these biases are and how they impact the search experience.

Like Google, Microsoft lets users control their search experience on Bing to an extent. When a Bing user is logged in to their Microsoft account, they can view and clear their browsing, search, location and other relevant activity history from the account. They can also manage and control some of the data that Microsoft collects. However, these controls are not available to users who are searching in Bing in a private browsing mode, or who are searching while not logged in.

In addition, Microsoft also demonstrates some concerning practices when it comes to accountability with the Bing search engine. According to Frédéric Dubut, the head of Bing’s spam team, Microsoft aims to assess intent when deciding whether to penalize a website for violating its guidelines (such as by spamming).118 However, how the company assesses intent is unclear. As previously highlighted, a web page can be penalized in a number of ways. These include neutralizing the impact of spam or negative intent, demoting a website in search rankings, or removing the website from search results.119 A publisher can submit a reconsideration request if they believe their website has been unfairly penalized. This is valuable, as it provides an appeals mechanism for users of Bing’s search engine. However, this process has been described as lengthy, thus raising the risk that an error on the part of Microsoft can seriously damage the success of a website.120 In order to provide greater accountability around their

newamerica.org/oti/reports/rising-through-ranks/ 23 search result ranking procedure, Microsoft should improve this procedure so that it generates resolutions in a timely manner.

Microsoft does, however, also demonstrate some positive practices. For example, it lets website owners request recrawls of their sites.121 Additionally, like Google, Microsoft receives various legal, copyright, and private party requests to remove and delist websites from Bing.122 The company issues an annual transparency report regarding content removal requests, which outlines the scope and scale of such requests. The report provides data on government requests for content removal, copyright removal requests, “Right to be forgotten” requests, and non- consensual pornography (“revenge porn”) removal requests. The report, however, does not break down these data points by Microsoft product, and as a result it is difcult to ascertain how often search results on Bing are impacted by these requests.123

Case Study: DuckDuckGo

DuckDuckGo is an internet search engine that launched in 2008, largely based on Free and Open Source Software (FOSS).124 Typically, search engines aim to distinguish themselves based on the comprehensiveness and accuracy of their search index, and the relevance of their search results.125 DuckDuckGo seeks to further distinguish itself by providing users with strong privacy protections that also enable them to evade the so-called flter bubble created by the personalization of search results. According to DuckDuckGo, the platform does not profle users and delivers all users the same search results, regardless of their past search history. The platform also asserts that it prioritizes providing users with the highest-quality search results, rather than the largest number of results. Today, DuckDuckGo is the sixth largest search engine in the world by market share.126 As of January 2019, DuckDuckGo achieved a new record in trafc with over 1 billion monthly searches.127 It ranks 186th for global internet engagement on Alexa rankings.128 DuckDuckGo uses its own web crawler, known as DuckDuckBot, and approximately 400 other sources in order to generate and curate its search results. These sources include other search engines, such as Bing, Yahoo!, and Yandex, and websites, such as Wikipedia.129 As concerns around consumer privacy have grown—especially following the 2013 Snowden disclosures and more recent data-sharing controversies like the Cambridge Analytica scandal—DuckDuckGo has seen a signifcant increase in its user base and website trafc.130

According to DuckDuckGo, the platform enforces encrypted HTTPS connections whenever websites provide them. When a user connects to an HTTPS-secured server, secure websites that ofer encrypted HTTPS connections will evaluate a website’s security certifcate and authenticate that it was issued by a legitimate authority. This helps secure sensitive information sent over an HTTPS

newamerica.org/oti/reports/rising-through-ranks/ 24 connection from electronic eavesdropping.131 Additionally, it can prevent the information from being modifed while in transit. DuckDuckGo also assigns each page a user visits a score that assesses to what extent that website is trying to mine the user’s data. In order to maintain user anonymity online, DuckDuckGo asserts that it blocks tracking cookies, which can be used to identify a user and their devices. It also scans and scores the privacy policies of diferent websites that a user visits. On the DuckDuckGo search engine, a user has the ability to clear their tabs and data automatically, at the end of a session or after a preset period of inactivity.132 In addition, although the company still provides advertisements, these are “contextual” ads that are based only on the content of a website (such as a current search query) rather than on a user’s behavioral profle, including their prior search history.133 Further, DuckDuckGo has stated that it does not store personal user information; it does, however, maintain a log of all search terms that have been used on its service.134

Despite the diferences in DuckDuckGo’s practices, some studies have indicated that DuckDuckGo is able to return results with the same level of quality as Google.135 However, this is generally accurate for broad topic search results, rather than niche topics.

Although DuckDuckGo seeks to provide users with a positive search experience that comes with strong privacy protections, the company does not provide a signifcant amount of transparency and accountability around its search curation and ranking practices. According to the DuckDuckGo research study conducted on Google’s ranking practices, a neutral search engine that is truly delivering non-personalized search results should be able to deliver the same results to all users, regardless of what browsing mode they are in. However, just because all users on the DuckDuckGo platform see the same results, it does not mean that these results are not curated and ranked using automated tools. The company does not share which signals it uses to perform this ranking. On the DuckDuckGo website, the company states “ranking is a bit opaque and difcult to discern/ communicate on an individual query basis because of all the various factors involved (and which change frequently). Nevertheless, the best way to get good rankings (in nearly all search engines) is to get links from high-quality sites.”136 It is therefore difcult to understand which factors DuckDuckGo’s search curation and ranking processes prioritize, and how these judgments impact users’ search experiences on the platform. By providing greater transparency, the platform could enhance its value proposition and demonstrate to users why they argue they are a better search engine choice.

One way the platform aims to deliver high-quality search results is by removing search results associated with content mill companies. Content mill companies are websites that publish numerous daily articles which are often produced by freelance writers (for example, eHow). These forms of content are considered low-quality, but they are written so they can rank highly in Google’s search index.

newamerica.org/oti/reports/rising-through-ranks/ 25 DuckDuckGo, however, removes them.137 The search engine has also begun experimenting with algorithms to remove spam links and other forms of low quality content. However, DuckDuckGo provides little transparency around the scope and scale of this process.

According to DuckDuckGo, the platform’s search engine supports user privacy, provides users greater protections around how their data is used, and aims to deliver neutral search results that do not exhibit bias and that prevent the creation of flter bubbles. However, the company does not provide adequate transparency and accountability around its ranking process, making it difcult for users and website owners to understand how expression is being controlled. The growth of the platform suggests there is a market for a service whose value proposition is built on protecting user privacy.138 This value proposition should be extended to include transparency and accountability.

newamerica.org/oti/reports/rising-through-ranks/ 26 News Feed Ranking

Search engines are not the only internet platforms that have adopted algorithmic curation and ranking practices. Many small and large online platforms, including social networking and review-based services, have also introduced these tools to curate and present content that is relevant to their interests and needs. The creation of these news feeds has opened up new avenues for users, businesses, brands, and content creators to create and disseminate information at great scale. They have therefore also helped to promote free expression online and boost the revenue of internet platforms. However, these methods of algorithmic curation and ranking also raise a number of concerns, particularly regarding algorithmic awareness and transparency, the creation of flter bubbles, and algorithmic bias. Like search engines, news feeds can shape user perspectives by prioritizing certain forms of content and deprioritizing others.

This section of the report focuses on how internet platforms deploy algorithmic curation and ranking practices in order to shape and operate their news feeds. It will use three platforms —Facebook, Twitter, and Reddit—as case studies of how such practices can vary and what concerns they surface.

Case Study: Facebook

Since its creation in 2004, social media company Facebook has expanded its services to include features such as messaging and “smart displays.” As of June 2019, Facebook has approximately 1.59 billion daily active users139 and ranks third in global internet engagement on Alexa rankings.140

Today, Facebook ofers one of the clearest examples of algorithmic content curation and ranking in a news feed. The Facebook News Feed is composed of stories produced by a user’s friends, Pages a user follows, Groups a user is part of, and suggested content such as stories and advertisements. Facebook launched the frst iteration of its News Feed in 2006.141 The launch of the News Feed drew signifcant backlash and controversy from users, even sparking calls to boycott the platform. In particular, users were concerned that the News Feed was eroding their privacy by making more information about them available to their friends and others on the platform, which Facebook asserts was not the case.142

Despite controversy around this major change, the News Feed has emerged as one of the largest and most signifcant billboards and content hubs for users, brands, publishers, and infuencers.143 It has also been a major driver of advertising, generating a signifcant amount of revenue for the platform.144 The News Feed was launched to drive further engagement—and thus revenue—on the platform; Facebook asserts it was also designed to present users with content

newamerica.org/oti/reports/rising-through-ranks/ 27 that is relevant and meaningful to them. Prior to the launch of the News Feed, users on Facebook and comparable platforms like MySpace and Friendster had to seek out content posted by their friends or Pages by visiting individual pages. The News Feed brought this information together and curated posts based on predictions of what content users were interested in and would engage with. It ranked these posts so that users could view the content that was deemed the most relevant to them frst. According to Facebook CEO , the average Facebook user has approximately 1,500 posts that could appear on their News Feed every day. However, users spend a limited amount of time on their News Feed, and as a result they are likely to only read and potentially engage with 100 of these posts. Facebook states that the News Feed curation process seeks to ensure that these 100 posts are the most relevant and meaningful to a user.145

However, in its eforts to provide users with more meaningful interactions, Facebook is also aiming to increase the time users spend on the platform. This in turn drives increased revenue through avenues such as advertising, furthering the company’s bottom line. Additionally, as Facebook has come under increased scrutiny since the 2016 U.S. presidential elections and the Cambridge Analytica scandal, it has invested signifcant resources towards convincing regulators that it should be able to continue to self-regulate, rather than address the obvious threats its business model poses to user privacy and more fundamentally for democracy. Facebook’s assertions that the platform now prioritizes meaningful interactions is one such method of convincing regulators that Facebook is a safe and well-meaning platform.

An earlier iteration of the Facebook News Feed deployed an algorithm known as EdgeRank. EdgeRank was used to determine which stories should appear in a user’s News Feed based on three signals: afnity score, edge weight, and time decay.146 Posts that had the highest EdgeRank score would appear at the top of a user’s News Feed. Because each user had a diferent afnity score, each user also had a diferent EdgeRank score. These scores were not public. Around 2011, as the News Feed algorithm evolved, EdgeRank was retired and the signals it relied on were incorporated into newer versions of the News Feed algorithmic system. 147

During this time, the Facebook News Feed garnered a reputation for looking and operating like a tabloid, since it heavily prioritized popular posts and advertisements. Additionally, the platform came under heavy criticism for promoting addictive practices that sought to maximize the amount of time users spent on the platform. This was because, at the time, how much time users spent on a platform was indicative of a platform’s popularity and success.148

In January 2018, however, Zuckerberg announced that the News Feed was going to be altered so that it prioritized “more meaningful social interactions” with friends and family over content produced by businesses and brands.149 Facebook

newamerica.org/oti/reports/rising-through-ranks/ 28 stated that this shift would place less emphasis on posts that are popular and rather place more of an emphasis on “authentic” posts that encourage and receive signifcant engagement from a user and their network.150 The new News Feed is reportedly based on three core principles. The frst is that Facebook users value meaningful and informative stories, the second is that they value accurate and authentic content, and the third is that they value principles that guide safe and respectful behavior.151

With this new News Feed, Facebook claimed that it hoped users would spend quality time on the platform, rather than more time.152 However, it could be inferred that if a user is engaging with more meaningful and relevant content on the platform, they would also spend more time on the platform.153 One year after the new News Feed algorithm was launched, a report by social media engagement tracking frm NewsWhip found that the platform had seen increased levels of engagement as well as greater amounts of content being posted and engaged with by friends and family.154

The Facebook News Feed algorithm goes through four stages in order to identify stories, rank them, and produce a tailored News Feed experience for each of its users.155

1. Inventory: The algorithm takes an inventory of what stories have been posted by a user’s friends and Pages a user follows. This is important to assess, as each News Feed is largely composed of content shared by a user’s connections.

2. Signals that inform ranking: The algorithm then evaluates each story using hundreds of thousands of signals. These signals include who posted a story and when it was posted, as well as more granular factors such as the time of day, and how fast a user’s internet connection is. This is particularly important for users with slower connections who can’t properly load certain forms of content.

3. Predictions: The News Feed algorithm utilizes machine learning in order to extract insights from a user’s past activity. These insights are used to predict how likely a user is to engage with a post, which is a metric for whether the user fnds a post meaningful.156 Some of the predictions the algorithm seeks to make include how likely a user is to comment on a story, how likely they are to spend time reading the story, and whether they would watch an entire video. It also makes some qualitative predictions such as how likely a user is to say that they found a story informative. While such predictions may be able to deliver relevant content to users in the short term, it presumes that a user’s behavior and interests will remain constant over time. It therefore can also result in the

newamerica.org/oti/reports/rising-through-ranks/ 29 creation of a flter bubble and prevent users from engaging with new content that matches their potentially expanding interests.

4. Relevancy score: The News Feed algorithm then uses all of the signals and insights at its disposal to calculate a relevancy score. These signals are used to calculate a range of probabilities, including the likelihood a user clicks on a story, the likelihood a user is to spend time on a story, the likelihood a user engages with a story through likes, comments, and shares, the likelihood a user will fnd a story informative, the likelihood a story is click-bait (posts that aggressively seek out likes and engagement), and the likelihood that a story links to a low-quality website. These predictions are compiled into a relevancy score, which is an overall prediction on how meaningful a given story is for a user. Facebook calculates a relevancy score for every story and all of a user’s connections every time a user opens their News Feed.

Facebook’s News Feed algorithm is infuenced by hundreds of thousands of signals that work to identify and rank content for each user. According to Facebook, these signals seek to prioritize content that refects the three News Feed pillars, and they each represent a distinct data point that Facebook’s News Feed algorithm considers and processes when ranking content. These signals can be explicit, such as likes, or implicit, such as the time a user spends on a page before returning to the News Feed.157 A study on Facebook’s News Feed Algorithm, based on information refected in Facebook’s “News Feed FYI” blog, sorted some publicly disclosed signals into six categories:158

1. Content signals: Content signals are factors that demonstrate how stories difer from one another. They include the format of a story (such as a link, video, or photo); the number of likes, comments, or reactions a story receives; and which friend or Page posted the story.

2. Source signals: Source signals are characteristics a user or Page demonstrates when they publish a post. These include the history of the Page and how often a Page has posted stories with click-bait headlines.

3. Audience signals: Audience signals are characteristics a user demonstrates when they consume a post, and they often refect patterns in content consumption. These include how often a user uses the “hide” feature to remove content from their News Feed and how often a user watches videos, in part or in their entirety, rather than scrolling past them.

4. Action signals: Action signals represent a user’s behavior when it comes to a specifc story. These signals include whether a user likes, clicks on, or engages with a specifc story, and the amount of time a user spends

newamerica.org/oti/reports/rising-through-ranks/ 30 reading a story or watching a video. The News Feed algorithm prioritizes active interactions, such as commenting and sharing a post, rather than passive interactions such as liking posts and click-throughs. This is based on the notion that active interactions require more efort and are therefore indicative of meaningful interactions.159 The algorithm also tends to favor posts with comments and replies to comments, as they indicate meaningful interactions via conversations.160

5. Relationship signals: Relationship signals are data points collected about the relationship between two users or Pages on the platform. These include how often two users engage with one another, and whether a user decides to unfollow a friend, Page, or Group.

6. Likelihood signals: Likelihood signals are the probabilities the Facebook News Feed algorithm calculates around how a user will interact with a post. They include the probability a user will like or comment on a story. These likelihood signals are compiled to determine how posts are ranked in the News Feed.

Facebook also provides temporary boosts to content known as “timely posts”. These are popular posts or news that are currently being discussed.161

According to Facebook, the platform also sought to prioritize “meaningful interactions” so that it could promote high-quality content in News Feeds and curb the spread of low-quality content such as spam, posts that are unverifed, click-bait posts, and posts that seek to spread misinformation.162 However, as previously mentioned, such eforts also aim to increase the amount of time that users spend on the platform, and thus drive revenue through avenues such as advertising.

Most recently in July 2019, Facebook announced they would downrank and reduce the spread of posts that made sensationalized health claims. They would also do this for Pages aiming to sell products or services that were based on misrepresented or false health-related claims.163 Additionally, in April 2019, Facebook deployed a new tactic, called Click-Gap, in order to reduce the amount of low-quality content that users see in their News Feed. A number of low-quality websites receive signifcant trafc from the Facebook platform. In order to tackle this, Facebook systems crawl and index the internet in order to identify such websites. They then downrank low-quality posts that link to these websites. This is based on the notion that such sites are relying on platforms like Facebook to drive views, and by doing this Facebook can stife their eforts. This approach is similar to the approach Google’s PageRank algorithm used to rank results when it frst launched. PageRank determined how high to rank a search result based on the number of websites and the quality of websites that linked to a given web

newamerica.org/oti/reports/rising-through-ranks/ 31 page.164 Because visibility drives impact, posts that are viewed less have less of an impact.

Facebook also uses a range of user-driven metrics in order to assess the quality of a post. These include whether users hide certain posts, and whether users report posts as spam in order to assess the quality of posts. According to Facebook, in this system, posts that are shared and engaged organically will rank higher on user’s News Feeds.165 However, the same March 2019 NewsWhip report that found that the algorithmic changes to the News Feed had increased engagement also found that the changes had not succeeded in adequately tackling the spread of misinformation and low-quality content. 166

Facebook has asserted that its current algorithmic curation and ranking model prioritizes meaningful interactions on the platform. However, this algorithmic system is also responsible for managing information fows and online speech, and presenting users with a certain experience based on the platform’s understanding of a user’s interests. This creates an opportunity to promote certain voices above others, and silence certain voices entirely. This may mean that the voices of dominant social and political groups will be amplifed and the voices of disproportionately targeted and already marginalized groups will be silenced. These algorithmic practices are also directly responsible for infuencing how users perceive their network and the world around them. As a result, greater transparency and accountability around Facebook’s News Feed algorithmic curation and ranking practices are needed.

In March 2019, Facebook introduced a “Why am I seeing this post?” feature in the News Feed. This feature, which is found in the right hand corner of each News Feed post, explains to users why the user is seeing a certain post (e.g. if it was posted by a friend or a Page the user follows, whether it is highly popular,167 etc.), how their past activity (such as whether they regularly watch videos or click on shared links)168 has informed the ranking of the posts in their News Feed, and what other factors typically infuence the ranking of posts in the News Feed. This feature also provides users with access to controls that let them edit their News Feed and privacy preferences.169 The News Feed preference controls enable users to select whose posts they see frst, unfollow users or groups in order to hide their posts, reconnect with users or groups in order to see their posts again, manage snooze settings on certain users or groups, and hide apps from the News Feed. In addition, through another News Feed control tab on the left hand side of the News Feed page, users can opt to view posts in a reverse-chronological order, rather than through the lens of the algorithmic curation and ranking flters. The default setting that the News Feed will always revert to, however, is the algorithmically curated and ranking mode, known as “Top Stories.” In this way, Facebook enables users to control their News Feed experience to a degree. However, it does not give them the option to opt out of the algorithmically personalized experience entirely, and the algorithmically personalized News Feed is the default option.

newamerica.org/oti/reports/rising-through-ranks/ 32 Although Facebook has shared some information about the signals it uses to rank content in a user’s News Feed, it has not provided a comprehensive overview of the range of signals used, and how these signals collectively work together in the News Feed algorithm to determine the ranking of posts. It also does not explain how diferent signals are weighted.170 Without greater transparency and accountability around how the platform is deploying these signals in their algorithmic curation and ranking practices, and around how these signals work together, users and publishers are unable to properly understand how their experience is being curated and how this can impact their worldview. They are also unable to understand exactly which characteristics of a post, user, or their network are prioritized during this curation and ranking process. This raises concerns that this algorithmic system can establish flter bubbles on the platform.

In addition, individual voices or communities that are suppressed by the algorithm are often left unable to understand why. This raises a number of concerns regarding algorithmic bias, and the extent to which algorithms refect and exacerbate the judgments, priorities, and preferences of their creators and society at large. Given the limited set of user controls over the News Feed, impacted users are unable to efectively mitigate this situation. However, it is difcult to imagine a set of user controls that could efectively mitigate the issue of algorithmic bias. Given the black box nature of much algorithmic decision- making, developers and users may not be aware of any systematic biases in an algorithm.

Facebook does not currently ofer an appeals process or channel for its News Feed curation and ranking eforts. Such a process could help remedy individual cases, even though it would not help to remedy systemic instances of bias.

Case Study: Twitter

Twitter is a microblogging and social network platform founded in 2006. It enables users to post messages known as “tweets” and has gained a reputation as a destination for live updates regarding news, politics, sports, and other current events. As of February 2019, the platform had 126 million daily users and171 it ranks 20th for global internet engagement on Alexa rankings.172

Originally, Twitter’s news feed—known as a “timeline”—was not algorithmically curated. Rather, content on a user’s timeline was presented in reverse- chronological order. In 2015, Twitter introduced a feature known as “While you were away”173 (later rebranded as “In case you missed it”),174 which aimed to curate notable recent tweets that a user may have missed while they were not using the platform.175 In 2016, Twitter introduced algorithmic curation into its timeline. This was an extension of the “While you were away” feature,176 as it used the same algorithms. According to Twitter, it was designed to deliver users

newamerica.org/oti/reports/rising-through-ranks/ 33 with the most relevant and useful tweets, rather than the most recent ones.177 Twitter has asserted that both of these features were based on the notion that a tweet from a few hours ago may be more relevant and meaningful to a user than one that was posted fve minutes ago. When using the reverse-chronological curation format, a user would miss out on such content.178 As per the new timeline feature, algorithmically curated tweets appeared at the top of a user’s timeline. However, these algorithmically curated tweets are a small subset of the tweets that have been posted since a user last visited the platform.179 As a result, if a user continued to scroll through the timeline, they would eventually begin seeing tweets in a reverse-chronological format.180

Twitter’s decision to roll out an algorithmically curated timeline was met with some backlash. Many users were concerned that by introducing this feature, the platform was stifing the public square characteristics of the platform, as what content was relevant and meaningful would now be determined by an algorithm. Some critics of an algorithmically curated timeline have advocated for a reverse- chronological feed, as this is perceived as a neutral presentation of content. In a reverse-chronological timeline, hashtags would play a strong role in promoting and highlighting conversations and virality would be organic.181 In this sense, critics contend, Twitter could ofer a democratic public square on its platform.182 Although the timeline algorithm could be used to surface content that is broadly considered relevant (such as headline news), it cannot reliably surface unexpected and diverse content that is also relevant, like an organically run public square platform could, as it makes judgments based on past user behavior. 183 The introduction of algorithmic curation in the timeline also faced backlash as it made users feel as if they had less control over their experience on the platform, and raised concerns over the creation of flter bubbles.184

These concerns sparked the hashtag #RIPTwitter in early 2016.185 When the new timeline feature rolled out, users had the option to opt out in a limited sense. They could choose to not see Top Tweets at the top of their timeline, but they would still receive curated tweets in other sections of their timeline.186 In response to the outcry, in September 2018, Twitter enabled users to toggle between algorithmically-curated Top Tweets and non-curated, reverse- chronological ordered Latest Tweets.187 Despite the public backlash, fewer than 2 percent of users opted out of algorithmic curation on the platform, which became the default option.188 However, the fact that only 2 percent of users opted out of algorithmic curation does not necessarily mean that users were not opposed to algorithmic curation. Rather, because algorithmic curation became the default, opting out required an extra step, and many users may not have wanted to engage in a more time consuming process or known how to do so.

Despite the controversy, Twitter has insisted that its research indicates individuals have a more positive experience on the platform when engaging with Top Tweets frst.189 In a test on the beta version of the algorithmically curated

newamerica.org/oti/reports/rising-through-ranks/ 34 timeline, which was performed on over 100 users and brands, the platform found that individuals tweeted and retweeted more often than they did when using a non-algorithmically curated timeline.190 This, however, assumes that greater engagement is synonymous with a positive user experience. This is not necessarily true. However, greater engagement does drive greater revenue for the platform, which may be a reason it advocated strongly for the algorithmically curated version of the timeline.

The Twitter timeline can consist of numerous sections.191

Top Tweets: The Top Tweets on a user’s timeline are algorithmically curated and ranked using a range of signals. This often also includes tweets from accounts that a user does not follow but may be interested in.

Latest tweets: If a user opts out of algorithmic curation, they will view a reverse- chronological feed of the latest tweets.

In case you missed it: If a user is visiting the Twitter app less frequently, they will see this algorithmically-curated selection of Top Tweets. A user typically only sees this feature in their timeline feed if they have not visited the platform for a number of hours or days. The tweets in this section are less recent, and do not appear in reverse-chronological order. Rather, they are organized based on their ranking scores. As a result, the tweet at the top of this section is the tweet that has the highest ranking score out of all possible tweets from every account a user follows since the last time they logged in.192

Happening now: This section occasionally appears at the top of a user’s timeline and it highlights specifc events or subjects of interest. This was originally introduced to focus on sports events and was later expanded to include breaking and personalized news.193

Trends for you: This algorithmically-curated section highlights popular trends and hashtags based on a user’s interests (as explained below). Users can also choose to have this content curated based on their location.

When a user opens Twitter, the platform collects and assesses every recent tweet from every account that a user follows and assigns each one a relevance score. This score aims to predict what content a user will fnd interesting.194 It is based on a range of factors, including the number of favorites and retweets a tweet has received, and how often the user has engaged with a particular account recently. Simultaneously, Twitter’s algorithm considers a range of other signals, such as how long a user has been away from the platform, how many accounts a user follows, and how a user behaves and uses Twitter, in order to determine how the relevance scores will impact the content on the user’s timeline.195 Content is then ranked based on a series of signals which assess how popular a tweet is and how accounts in a user’s network are engaging with it.196 These signals include:197

newamerica.org/oti/reports/rising-through-ranks/ 35 Recency: How recently a tweet was posted.

Overall engagement: How many retweets, clicks, favorites, and impressions a tweet has garnered. This signal also considers how much time users have spent reading the tweet.198

Engagement relative to other tweets from the same user: How often users engage with the posting user through active engagements and impressions.

Rich media: The type of media that the tweet includes such as images, videos, GIFS, and polls.

The types of media users typically engage with: If a user typically engages with a specifc type of content, such as photos or videos, then they are more likely to see tweets that contain these media formats.

Account engagement and interactions: How often a user engages with a particular author or account, the strength of the user’s connection to this account, the origin of this relationship,199 and how much time a user spends reading tweets posted by this author, even if they do not engage with them.200

Signals such as account interactions, engagement, user interest, network activity, 201 how long a user has been away from the site, how many followers an account has, and the account’s location relative to users also play a role in how content is curated and ranked on the Twitter timeline.202 Today, deep learning is the central modelling component in timeline ranking.203

Like Facebook, Twitter asserts that it has updated its ranking algorithm in order to improve the “health” of conversations on its platform. In 2018, this was done in order to combat instances of trolling, harassment, and abuse. The new algorithmic system uses behavioral signals in order to assess whether a Twitter account is adding to or detracting from conversations, based on how other accounts react to content. For example, if a user sends the same message to multiple users and they all block or mute the sender, this will suggest that the sender is detracting from conversations. If the recipients reply to or “heart” the messages, however, this suggests that the sender is positively contributing with interactions. The algorithm also considers signals such as whether an account has a confrmed email address and whether an account appears to be leading a coordinated attack. Those tweets that are identifed as detracting from conversation will be deprioritized in the timeline and will therefore appear lower in search results or replies.204

As demonstrated by the backlash against Twitter’s decision to implement an algorithmically curated and ranked timeline, users and publishers have expressed concerns over how such a system manages online expression and creates and reinforces certain perspectives based on what it thinks a user is

newamerica.org/oti/reports/rising-through-ranks/ 36 interested in. Like Facebook, Twitter also fails to provide signifcant transparency and accountability around its algorithmic curation and ranking practices.

Twitter provides its users with a range of limited controls over their timeline experience. These include the ability to unfollow, mute, and block certain accounts. Users can also select the “show less often” feature, which provides feedback on certain tweets to Twitter so that it can better tailor the timeline experience in the future. In addition, users can opt in to or out of permitting Twitter to personalize their experience based on their “inferred identity” and places that a user has been. As previously mentioned, users also have the option of toggling between an algorithmically curated timeline and a reverse- chronological timeline.205

There is some public information around which signals Twitter uses to curate and rank content on the timeline. However, the company has not released a comprehensive overview of these signals, how they work together to curate and rank posts, and how they are weighted. Without greater transparency and accountability around how this algorithmic curation is taking place, users and publishers are unable to fully understand and control how their worldview is being shaped and what specifc characteristics the Twitter timeline algorithm is designed to prioritize. This once again raises concerns regarding the creation and reinforcement of flter bubbles.

Furthermore, given that Twitter is often viewed as a digital public square, the platform’s algorithmic curation and ranking practices raise concerns regarding which voices the algorithm determines are important and worth amplifying, and whether these determinations refect the same values and judgments that humans would place when assessing public discourse. A lack of transparency around the signals the algorithm uses makes evaluating how the timeline algorithm impacts public discourse even more difcult. Furthermore, a lack of transparency around how the timeline algorithm operates and is constructed also raises concerns around hidden biases in the algorithm and its signals. These biases prioritize certain types of interactions and types of content over others, and can refect the unintentional biases of their creators or the training data with which they were created. Given the limited set of user controls over the timeline, and the fact that the platform does not ofer an appeals process or channel related to its timeline curation and ranking practices, users who feel as if they have been silenced have no method for recourse.

Case Study: Reddit

Reddit is a social news aggregation and discussion website that was founded in 2005. The platform has approximately 330 million monthly active users worldwide206 and is ranked 16th for global internet engagement.207 Reddit enables users who operate under pseudonyms, to create subpages, called

newamerica.org/oti/reports/rising-through-ranks/ 37 subreddits, on specifc interests or topics. In this way, the platform has become popular among particular interest or activity-focused communities, such as gamers and sports fans. This represents a signifcant diference between Reddit on one hand, and Facebook and Twitter on the other. Unlike its counterparts, which emphasize bilateral or multilateral relationships in which users interact on a broad range of topics and interactions between users, Reddit emphasizes users’ participation in thematic forums, or subreddits, with tangible implications for its use of content-shaping algorithms.

Reddit deploys a series of algorithms in order to rank posts and comments on its home page feeds as well as on each individual subreddit. The code for these algorithms is open-source and available publicly online.208 This ranking system is largely infuenced by the platform’s user-driven voting system.209 All Reddit users who are logged in can vote on links and comments in order to indicate their meaningfulness and usefulness. In this system, an upvote indicates that a user fnds content interesting and relevant, and a downvote suggests that the user fnds the content uninteresting, of-topic, or otherwise not meaningful.210 Links and comments that receive a signifcant number of upvotes will appear higher on the website’s front page or on the front page of a given subreddit. Each link and comment on the platform are assigned a number of points, known as a score, which loosely corresponds to the diference in the number of upvotes it has received and the number of downvotes it has received. The exact calculations of this fgure are kept hidden, however, in order to prevent spammers and other actors with negative intentions from gaming the system.211 On the platform, comments are by default sorted using the “best” comments flter. As a result, comments with the highest number of upvotes are likely to be viewed more often.212 In addition, the posts with a signifcant number of comments are also typically ranked higher than others. This suggests an element of democracy on the platform, as Reddit seeks to rank the content that users engage with—and therefore value the most—the highest. 213

The score that a post or comment receives translates into “karma” points for the posting user. Karma is an informal user ranking on Reddit measuring how much users value a particular account’s contributions to the Reddit community.214 A user who frequently contributes high-ranking posts or comments will build a high karma score denoting their total net-positive impact on the site. This system is not foolproof, however. A user can easily gain karma points by reposting popular content across multiple subreddits and by posting content that aligns with the general mentality and values of a certain subreddit or the platform as a whole.215 Additionally, as a user gains more karma points, or as a post or comment gains more upvotes or downvotes, it can spark a bandwagon response in which other users vote in line with the general trend. In order to prevent this, some subreddits hide karma totals for certain periods. However, this is not a complete solution.216

newamerica.org/oti/reports/rising-through-ranks/ 38 Reddit deploys diferent algorithmic approaches when ranking posts and comments. When a user logs in, they can choose to view content on their homepage feed using a range of algorithmically-curated options. These curation options sort content into categories: best, hot, new, controversial, top, and rising. 217 According to a 2015 blog post by Amir Salihefendic, the CEO of Doist who has conducted signifcant research on Reddit’s algorithmic ranking practices, posts on Reddit under the “hot” category are ranked using an algorithm known as the “hot ranking algorithm.” This algorithm is impacted by signals including:218

1. Submission time: The time at which a post was submitted is a major factor infuencing how a post ranks on Reddit. The hot ranking algorithm ranks new stories higher than older stories.

2. The logarithm scale: The hot ranking algorithm uses the logarithm function to weigh the earlier votes higher than later ones. This means that generally, the frst ten upvotes that a post receives will have the same weight as the next 100 upvotes. These 100 upvotes will in turn have the same weight as the next 1,000 upvotes, and so on. Therefore, a post that has 10 very recent upvotes and a post that has 50 older upvotes could rank similarly on the platform.

3. Downvotes: Reddit is one of the few platforms on the internet that deploys a downvote feature. Posts that get a large number of upvotes and downvotes, as well as that get a large number of downvotes, will therefore rank lower on news feeds. This particularly impacts content that is controversial.

Reddit’s comment ranking algorithm was theorized by Randall Munroe, an American cartoonist, engineer, and scientifc theorist. He argued that the hot ranking algorithm would not be suitably applicable for ranking comments on the platform, as it would preference comments that were posted more recently, rather than the comments that were considered the most meaningful. The solution to this was to deploy Wilson’s Score Interval, which uses a confdence sort to treat a vote count on a comment as a statistical sample of a hypothetical full vote by opinion, similar to an opinion poll. This system provides each comment with a provisional ranking, that it is 85 percent sure the comment will reach. The more votes that a comment receives, the closer its score gets to this 85 percent confdence estimate. This system helps ensure that if a comment has only one upvote and zero downvotes, it will retain a 100 percent upvote rate. However, because the system does not have enough data on this comment, it will be ranked lower. If a comment received ten upvotes and only one downvote, on the other hand, the system could accrue enough confdence to place this comment above something with 40 upvotes and 20 downvotes, as it ascertains that by the time this frst post gets 40 upvotes, it would have fewer than 20

newamerica.org/oti/reports/rising-through-ranks/ 39 downvotes. If the system is wrong, which it is 15 percent of the time, then it will work to get more data so that comments with less data are ranked lower. The confdence sort in this system is not impacted by submission time, but rather it is impacted by how many upvotes a comment receives compared to the total number of votes and the sample size. The more votes a comment gets, the more accurate its confdence score is.219 However, when subreddits have a large number of posts, it is likely that most people simply read the comments in the “best” section and vote on them. This prevents other comments from gaining traction and220 can create a preference toward these pieces of content.221 Users can also feel persuaded to vote for already popular content, due to a herd mentality.222 Additionally, users can create multiple accounts in an attempt to rig the voting system.223

Reddit’s algorithms also come into play when curating content for /r/all, which is the home page that non-logged-in users see. When a user frst creates an account on Reddit, they are subscribed to a list of default subreddits that aim to highlight the range of communities, interests, and genres of content available on the platform.224 Once a user has an account, they can curate their own home page feeds by subscribing to subreddits of interest to them, and unsubscribing from default subreddits if they prefer. However, users who prefer to not create an account are unable to pick and choose which specifc subreddits they engage with. For these users, the Reddit homepage displays the /r/all page that contains algorithmically curated content from a range of subreddits on the platform in order to demonstrate the breadth of popular content available on the service.225 The algorithm used to sort this page tends to highlight material across subreddits that is new and has been upvoted a lot.

However, for its content to make it to the /r/all page, a subreddit often already has to have a large subscriber base capable of driving a large score. The default subreddits that users are automatically subscribed to are examples of these. However, by privileging these default subreddits, other, organically created and operated subreddits are less frequently highlighted. This can silence certain voices and render certain communities and interest groups invisible to non- logged in users of the service. However, subreddit moderators can voluntarily opt out of this curation as well. This is often done if moderators feel that their content is controversial or not ft for public consumption (also known as “not safe for work” or NSFW).226 In this way, the opt-out feature can act as a privacy mechanism.

Reddit’s algorithmic curation and ranking systems seek to prioritize and deliver relevant and meaningful content to its users. Like on Facebook and Twitter, this raises a host of concerns regarding which voices are prioritized and how these voices are identifed. When it comes to providing transparency and accountability around its algorithmic curation and ranking practices, Reddit

newamerica.org/oti/reports/rising-through-ranks/ 40 ofers some novel approaches, but also fails to adopt some existing practices that are gradually becoming more common across the industry.

Reddit’s approach to transparency is novel, in that it publishes the code for its ranking algorithms publicly online in an open-source format. This enables users, researchers, and publishers to better understand how content is tailored and ranked on the platform, what characteristics Reddit’s algorithms preference, and how this may impact a user’s worldview. This is one way of potentially revealing the existence of a flter bubble, as well. However, in order to efectively use this resource and extract valuable insights from it, an individual would have to have a relatively extensive technical background. Therefore, although the platform provides some valuable information on its curation and ranking practices, there are barriers to accessing and understanding it. Reddit also does not have a company-issued page explaining to its users how the algorithmic curation and ranking system works. As a result, most public information about Reddit’s ranking system and the signals and processes it uses are based on research or speculation, rather than company-verifed information. Therefore, Reddit should publish information in language that is accessible to individuals who lack a technical background, as well as the general public. This will provide greater transparency and accountability around how Reddit curates and ranks content across its home page and subreddits.

Because content on Reddit is curated and ranked primarily on the basis of user votes, users do not have as many additional controls over their news feed experiences on the platform. Aside from voting on content, Reddit users can hide posts on the front page news feed and in subreddits. They can also sort content using a range of flters, including the “new” category, which flters content in a reverse-chronological order. Aside from this, however, users do not have any signifcant further controls, as the assumption is that user votes are representative of what users fnd interesting and meaningful, and the Reddit algorithm curates and ranks based on this. Like Facebook and Twitter, Reddit does not ofer an appeals process or channel for its news feed curation and ranking eforts.

newamerica.org/oti/reports/rising-through-ranks/ 41 Promoting Fairness, Accountability, and Transparency Around Algorithmic Curation and Ranking Practices

As demonstrated in this report, the deployment of algorithmic curation and ranking practices by search engines and internet platforms have established their roles as gatekeepers of online information fows and online expression. Over the past few years, these practices have increasingly been dictated by the business models and services of these companies. Despite the growing prevalence of such algorithmic curation, internet platforms have demonstrated a fundamental lack of transparency and accountability around how such practices are implemented, and how these practices impact users, their worldviews, and publishers. Going forward, search engines and internet platforms, policymakers, and researchers should consider the following set of recommendations in order to promote greater fairness, accountability, and transparency around algorithmic decision- making in this space.

In particular, search engines and internet platforms that deploy algorithmic tools to curate and rank content in their search results or news feeds need to:

• Make a concerted and explicit efort to raise awareness around these practices, and provide adequate transparency around how they impact users’ experiences and free expression online. For the most part, there is little transparency around the algorithms that these platforms use, particularly when it comes to their operational mechanisms and how they translate inputs into outputs. Going forward, internet platforms should provide more efective and meaningful transparency around these algorithms’ decision-making practices.

◦ Search engines and internet platforms should disclose and explain the various stages and procedures involved in collecting, curating, and ranking content in search engine results or in news feeds. This information should be updated whenever the platform decides to alter or refne its curation and ranking practices. This information should be available publicly and should be presented in a format that is easy to understand for members of the general public. This information should also be housed in a centralized and easy-to- access area, such as on a page dedicated to the topic on the platform’s website.

◦ Search engines that employ a search quality rating process should publicly disclose what this procedure entails, including how any

newamerica.org/oti/reports/rising-through-ranks/ 42 human raters are trained and evaluated, and how the search engine ensures that these individuals represent a diverse array of perspectives. In addition, if a search engine has developed guidelines for these raters, they should be publicly available, as this would enable users, researchers, and publishers to gain a better understanding of what values a search engine emphasizes in its curation and ranking process.

◦ Where possible, platforms that engage in algorithmic content curation and ranking should provide additional resources that can help users, researchers, and publishers understand how a platform assesses and ranks content. A good example of this is the open- source code that Reddit provides for its news feed. However, because these resources often require a specifc, technical expertise to understand and efectively use, they should not replace corporate public disclosures and explanations on algorithmic curation and ranking practices. Rather, they should supplement them.

◦ Given that search engines are increasingly delisting search results due to legal requests, violations of their search engine guidelines, and requests under frameworks, such as the “right to be forgotten,” greater transparency and accountability needs to be provided around these procedures, and how they impact free expression. One method of doing this is by highlighting the scope and scale of these requests in a corporate transparency report. Search engines should also provide impacted website owners with notice of these removals and ofer them the opportunity to appeal these decisions in a timely manner.

• Provide greater transparency to identify the diferent implicit and explicit preferences that are built into curation and ranking systems. Generally, this is difcult to do with algorithmic systems, as many of these biases are hidden and implicit. This is likely going to be the case for such curation and ranking systems as well. However, the fact that these systems are based on a hundreds of signals means that some of these preferences are also explicit.

◦ Search engines and internet platforms should publicly disclose a comprehensive list of the signals their curation and ranking systems are based on. This information should also include how these signals interact and build of one another, and how they are weighted. If it is not possible to do this at a granular level, due to concerns over trade secrets and competition, for example, then these platforms should at least provide a comprehensive list of

newamerica.org/oti/reports/rising-through-ranks/ 43 categories encompassing the diferent types of characteristics and signals that these systems consider, how these categories interact with one another, and how they are weighted. This will generate a greater understanding of which qualities are emphasized and valued more on a platform, and as a result which voices are amplifed and silenced. This list of signals or categories of signals should be available in a single, public, central location that is easily accessible. This information should also be presented in a format that is easily comprehensible by a general audience.

◦ When a search engine or internet platform updates the signals their ranking systems are based on, it can have a signifcant impact on the presentation and delivery of user and publisher content. As a result, platforms should strive to provide as many updates explaining these changes as possible. If the platform opts to publish categories of signals, rather than a comprehensive list of signals, these announcements should explain how these changes impact the overall category of signals. Currently, platforms make such announcements around changes such as those that aim to reduce the spread of misinformation or curb abuse. These announcements often outline the companies’ thinking and intention behind these algorithmic changes, and can be valuable for shining a light on the priorities of engineers who implemented these changes. This is because such announcements often underscore what values and assumptions the company made when approaching a content issue.

• Provide users with a robust set of controls which enable them to tailor their own search and news feed experiences. In particular, users should be able to:

◦ Provide feedback on search results and posts in their news feeds. This should include the ability to hide, block, or flter out certain posts. These controls let users manage their own experiences and also provide the algorithms with valuable feedback on what content a user deems relevant and meaningful.

◦ Determine whether and to what extent these systems are permitted to collect and use their personal data. Data such as a user’s location history, purchase history, or past browsing activity can be used to tailor a user’s online experience and inform targeted advertising. As a result, users should be able to control which of these data points are collected, particularly in instances where the data points are used for secondary purposes and are not integral for the company to provide the service. In addition, users should have strong controls related to data retention practices. This should

newamerica.org/oti/reports/rising-through-ranks/ 44 include the ability for a user to clear their search history and delete the data a platform has collected on them.

◦ Opt in to having their personal data used to refne and develop AI and machine-learning models. Currently, users are automatically opted in to having their data considered in datasets that train algorithmic models, with no method for opting out. This system raises signifcant privacy concerns, especially because these datasets can include highly personal information such as demographic characteristics, which can be used to target users in privacy-intrusive ways. In cases when highly personal data is being collected and used for potentially invasive purposes, such as facial recognition, users should always have to opt in to having their data used to train AI and machine-learning models. This is because as biometric information cannot be altered or changed like a credit card number can. Companies should also develop robust privacy policies around user data that is collected and used for training AI models.

◦ Opt in to receiving algorithmically curated and ranked content. Ideally, users should only receive algorithmically curated and ranked content after afrmatively opting in—it should not be the default setting. However, if companies maintain that users should receive this content by default, users should be able to completely and easily opt out if they do not want to receive such content in their search results or news feeds. Additionally, this should be possible on search engines regardless of whether a user is browsing as a logged-in user, logged-out user, or private/anonymous browsing user.

• Enable publishers to understand and exercise some control over how their content is collected, curated, and ranked.

◦ Both search engines and internet platforms should ofer website publishers and content creators clear and detailed guidelines on how they can operate fairly and successfully on a given platform. For search engines, this means publicly sharing a detailed set of webmaster guidelines. For internet platforms with news feeds, this means publicly disclosing relevant rules and guidelines related to content creators.

◦ Search engines should also let website publishers determine how their website is crawled. One way of doing this is using a sitemap. In addition, website publishers should be able to request recrawls if they feel their website was not adequately represented in a search

newamerica.org/oti/reports/rising-through-ranks/ 45 engine’s index. Furthermore, website publishers should be able to opt out of crawling altogether if they so desire.

◦ Website publishers who feel that their websites have been unfairly penalized in search engine results should be able to appeal these decisions and request a reconsideration. This appeals process should be timely and should provide adequate notice to the web page publisher of the outcome. Search engines that ofer this appeals process should also educate website publishers so they know it exists and know how to use it.

◦ Content creators who feel that their content has been unfairly penalized in news feeds should also be able to appeal these decisions. This appeals process should be timely. Internet platforms that ofer this procedure should publicize information about this process and make it easy to understand.

• Internet platforms, policymakers, and researchers should collaborate on, promote, and fund further research on the impacts of algorithmic curation and content ranking. This is particularly important amid growing concerns around the impact of these algorithmic decision-making practices on democratic values, political polarization, and freedom of expression.

• Internet platforms, researchers, and civil society organizations should collaborate to develop a set of industry-wide best practices for transparency and accountability around algorithmic curation and ranking practices. These best practices should explicitly prioritize the public interest above corporate business models and concerns about trade secrets. This will help ensure that users are adequately educated and aware of these practices, have a range of meaningful controls at their disposal, and know how to use them. It will also promote greater accountability around these algorithmic decision- making practices and combat the need for knee-jerk legislation or regulation.

newamerica.org/oti/reports/rising-through-ranks/ 46 Notes 10 Studies have been conducted by Dr. Francesca Tripodi from James Madison University and by 1 Herman Tavani, "Search Engines and Ethics," ed. researchers at Google which debunk the presence of Edward N. Zalta, Stanford Encyclopedia of bias against conservatives. Dr. Tripodi and Mr. Karan Philosophy, last modified 2016, https:// Bhatia, Vice President for Government Affairs & stanford.library.sydney.edu.au/entries/ethics-search/ Public Policy at Google highlighted this research #SearEngiDeveEvolShorHist when they testified to the Senate Committee on the Judiciary on July 16, 2019. The hearing was titled 2 Motahhare Eslami et al., FeedVis: A Path for “Google and Censorship through Search Engines”. Exploring News Feed Curation Algorithms, 2015, http://www-personal.umich.edu/~csandvig/ 11 Bing, "Bing Delivers Text-to-Speech and Greater research/Eslami-CSCW-demo-2015.pdf Coverage of Intelligent Answers and Visual Search," Bing Blogs, entry posted March 20, 2019, https:// 3 Engin Bozdag, "Bias in Algorithmic Filtering and blogs.bing.com/search/2019-03/Bing-delivers-text- Personalization," Ethics and Information Technology to-speech-and-greater-coverage-of-intelligent- 15, no. 3 (September 2013). answers-and-visual-search

4 Tavani, "Search Engines," Stanford Encyclopedia 12 "Search Engine Market Share Worldwide," of Philosophy. StatCounter, last modified July 2019, https:// gs.statcounter.com/search-engine-market-share 5 Anjana Susarla, "The New Digital Divide is Between People Who Opt Out of Algorithms and 13 "Search Engine," StatCounter. People Who Don't," The Conversation, last modified April 17, 2019, http://theconversation.com/the-new- 14 "Google.com Competitive Analysis, Marketing digital-divide-is-between-people-who-opt-out-of- Mix and Traffic," Alexa Internet, https:// algorithms-and-people-who-dont-114719 www.alexa.com/siteinfo/google.com

6 Carole Cadwalladr, "Google, Democracy and the 15 Google, How Google Fights Disinformation, Truth about Internet Search," The Guardian, February 2019, https://storage.googleapis.com/ December 4, 2016, https://www.theguardian.com/ gweb-uniblog-publish-prod/documents/ technology/2016/dec/04/google-democracy-truth- How_Google_Fights_Disinformation.pdf internet-search-facebook 16 Tim Soulo, "Google PageRank is NOT Dead: Why 7 Brian Barrett, "I Used Only Bing for 3 Months. It Still Matters," Ahrefs Blog, entry posted September Here's What I Found—And What I Didn't," WIRED, 13, 2018, https://ahrefs.com/blog/google-pagerank/ October 17, 2018, https://www.wired.com/story/ tried-bing-search-google-microsoft/ 17 Safiya Umoja Noble, Algorithms of Oppression: How Search Engines Reinforce Racism (New York 8 Barrett, "I Used". University Press, 2018).

9 Jillian D'Onfro, "Trump is Slamming Google's 18 Eli Pariser, The Filter Bubble: What the Internet is News Results But Here's How Microsoft's Bing Stacks Hiding From You (New York: The Penguin Press, 2011). Up," CNBC, August 29, 2018, https://www.cnbc.com/ 2018/08/29/how-google-and-bing-choose-news- 19 https://www.cnbc.com/2018/09/17/google-tests- stories-for-search.html changes-to-its-search-algorithm-how-search- works.html

newamerica.org/oti/reports/rising-through-ranks/ 47 20 Google, "How Search Organizes Information," 29 Google, "How Search Algorithms Work," Google. How Search works, https://www.google.com/search/ howsearchworks/crawling-indexing/ ; A sitemap is a 30 Todd Haselton and Megan Graham, "Google file which contains information about the web pages, Uses Gmail to Track a History of Things You Buy — videos and other files on a website, and the And It's Hard to Delete," CNBC, May 17, 2019, https:// relationships between these elements. It informs a www.cnbc.com/2019/05/17/google-gmail-tracks- search engine which files are the most important and purchase-history-how-to-delete-it.html provides integral information about the pages including when they were last updated, how often 31 Google, "How Search Algorithms Work," Google. the page has been changed and whether it is 32 Google, "How Search Algorithms Work," Google. available in other languages. Google uses sitemaps in order to more effectively crawl a website. 33 Google, Search Quality Evaluator Guidelines, September 5, 2019, https:// 21 Google, "How Search Organizes Information," static.googleusercontent.com/media/ How Search works. guidelines.raterhub.com/en// 22 Transparency & Accountability: Examining Google searchqualityevaluatorguidelines.pdf and its Data Collection, Use and Filtering Practices: 34 Google, Search Quality Evaluator Guidelines. Hearings Before the Judiciary Committee (2018) (statement of Mr. Sundar Pichai). https://republicans- 35 Google, How Google Fights Disinformation. judiciary.house.gov/hearing/transparency- accountability-examining-google-and-its-data- 36 Google, "Personalized Search for Everyone," Goo collection-use-and-filtering-practices/ gle Official Blog, entry posted December 4, 2009, https://googleblog.blogspot.com/2009/12/ 23 Danny Sullivan, "How We Keep Search Relevant personalized-search-for-everyone.html and Useful," The Keyword (blog), entry posted July 15, 2019, https://www.blog.google/products/search/ 37 Natasha Lomas, "Google 'Incognito' Search how-we-keep-google-search-relevant-and-useful/ Results Still Vary From Person to Person, DDG Study Finds," TechCrunch, December 4, 2018, https:// 24 Google, "How Search Algorithms Work," Google, techcrunch.com/2018/12/04/google-incognito- https://www.google.com/search/howsearchworks/ search-results-still-vary-from-person-to-person-ddg- algorithms/ study-finds/

25 Google, "How Search Algorithms Work," Google. 38 Lomas, "Google 'Incognito,' Search Results Still Vary From Person to Person, DDG Study Finds”. 26 Tim Soulo, "Google PageRank is NOT Dead: Why It Still Matters," Ahrefs Blog, entry posted September 39 Barry Schwartz, "Google Admits It's Using Very 13, 2018, https://ahrefs.com/blog/google-pagerank/ Limited Personalization In Search Results," Search Engine Land (blog), entry posted September 17, 2018, 27 Andreea Sauciuc, "Cognitive SEO," Does Google https://searchengineland.com/google-admits-its- PageRank Still Matter in 2018? A Retrospective View using-very-limited-personalization-in-search- in the PageRank History, https://cognitiveseo.com/ results-305469 blog/19033/google-pagerank/

40 Lomas, "Google 'Incognito,' Search Results Still 28 Google, "How Search Algorithms Work," Google. Vary From Person to Person, DDG Study Finds”.

newamerica.org/oti/reports/rising-through-ranks/ 48 41 DuckDuckGo, Measuring the "Filter Bubble": How 55 Pariser, The Filter Bubble: What the Internet is Google is Influencing What You Click, December 4, Hiding From You. 2018, https://spreadprivacy.com/google-filter- bubble-study/ 56 Will Oremus, "The Filter Bubble Revisited," Slate, April 5, 2017, https://slate.com/technology/2017/04/ 42 DuckDuckGo, Measuring the "Filter Bubble": filter-bubbles-revisited-the-internet-may-not-be- How Google is Influencing What You Click. driving-political-polarization.html

43 Lomas, "Google 'Incognito,' Search Results Still 57 The Hill Staff, "Watch Live: Google CEO Testifies Vary From Person to Person, DDG Study Finds”. Before House Judiciary Committee," The Hill, December 11, 2018, https://thehill.com/video/ 44 DuckDuckGo, Measuring the "Filter Bubble": technology/420768-watch-live-google-ceo-sundar- How Google is Influencing What You Click. pichai-testifies-before-house-on-data

45 DuckDuckGo, Measuring the "Filter Bubble": 58 Casey Newton, "The Real Bias on Social How Google is Influencing What You Click. Networks Isn't Against Conservatives," The Verge, April 11, 2019, https://www.theverge.com/interface/ 46 Lomas, "Google 'Incognito,' Search Results Still 2019/4/11/18305407/social-network-conservative- Vary From Person to Person, DDG Study Finds”. bias-twitter-facebook-ted-cruz

47 DuckDuckGo, Measuring the "Filter Bubble": How 59 Google, How Google Fights Disinformation. Google is Influencing What You Click. 60 Sullivan, "How We Keep," The Keyword (blog). 48 DuckDuckGo, Measuring the "Filter Bubble": How Google is Influencing What You Click. 61 Google, How Google Fights Disinformation.

49 DuckDuckGo, Measuring the "Filter Bubble": 62 Google, How Google Fights Disinformation. How Google is Influencing What You Click. 63 Google, How Google Fights Disinformation. 50 Lomas, "Google 'Incognito,' Search Results Still Vary From Person to Person, DDG Study Finds”. 64 Noble, Algorithms of Oppression.

51 Lomas, "Google 'Incognito,' Search Results Still 65 Berin Szoka and Adam Marcus, eds., The Next Vary From Person to Person, DDG Study Finds”. Digital Decade: Essays on the Future of the Internet (TechFreedom, 2011). 52 Nick Statt, "Google Personalizes Search Results Even When You're Logged Out, New Study Claims," 66 Google, How Google Fights Disinformation. The Verge, December 4, 2018, https:// www.theverge.com/2018/12/4/18124718/google- 67 Barry Schwartz, "SEOs Noticing Ranking search-results-personalized-unique-- Volatility in Google's Search Results," Search Engine filter-bubble Journal, last modified January 10, 2019, https:// searchengineland.com/seos-noticing-ranking- 53 Statt, "Google Personalizes Search Results Even volatility-in--search-results-310328 When You’re Logged Out, New Study Claims" 68 Sullivan, "How We Keep," The Keyword (blog). 54 Google, How Google Fights Disinformation. 69 Google, How Google Fights Disinformation.

newamerica.org/oti/reports/rising-through-ranks/ 49 70 Google, How Google Fights Disinformation. 85 Google, "Personalized Search," Google Official Blog. 71 Google, Government Requests to Remove Content, https://transparencyreport.google.com/ 86 Lomas, "Google 'Incognito,' Search Results Still government-removals/overview?hl=en Vary From Person to Person, DDG Study Finds”.

72 Google, Government Requests to Remove 87 DuckDuckGo, Measuring the "Filter Bubble": How Content. Google is Influencing What You Click.

73 "The Santa Clara Principles On Transparency and 88 Google, "How Search Organizes Information," Accountability in Content Moderation," https:// How Search works. santaclaraprinciples.org/ 89 Microsoft, "Microsoft's New Search at Bing.com 74 Deirdre K. Mulligan and Daniel S. Griffin, Helps People Make Better Decisions," entry posted "Rescripting Search to Respect the Right to Truth," G May 28, 2009, https://news.microsoft.com/ eorgetown Law Technology Review 2.2, no. 557 2009/05/28/microsofts-new-search-at-bing-com- (2018): https://georgetownlawtechreview.org/wp- helps-people-make-better-decisions/ content/uploads/2018/07/2.2-Mulligan-Griffin- pp-557-84.pdf 90 Microsoft, "Microsoft's New Search at Bing.com Helps People Make Better Decisions." 75 Sullivan, "How We Keep," The Keyword (blog). 91 Microsoft, "Microsoft's New Search at Bing.com 76 Google, How Google Fights Disinformation. Helps People Make Better Decisions."

77 Google, How Google Fights Disinformation. 92 Microsoft, "Microsoft's New Search at Bing.com Helps People Make Better Decisions." 78 Google, How Google Fights Disinformation. 93 Microsoft, Annual Report 2018, https:// 79 Sullivan, "How We Keep," The Keyword (blog). www.microsoft.com/en-us/annualreports/ar2018/ annualreport 80 Barry Schwartz, "Google and Bing Talk Web Spam and Penalties," Search Engine Land, last 94 "Search Engine," StatCounter. modified June 6, 2019, https:// searchengineland.com/google-and-bing-talk-web- 95 J. Clement, "Bing - Statistics & Facts," Statista, spam-and-penalties-317902 last modified October 24, 2018, https:// www.statista.com/topics/4294/bing/ 81 Sullivan, "How We Keep," The Keyword (blog). 96 "Bing.com Competitive Analysis, Marketing Mix 82 Google, How Google Fights Disinformation. and Traffic," Alexa Internet, https://www.alexa.com/ siteinfo/bing.com 83 Schwartz, "Google and Bing," Search Engine Land. 97 Adam Dorfman, "How Bing Is Enhancing Search and Apparently Growing As A Result," Search Engine 84 Schwartz, "Google and Bing," Search Engine Land, last modified August 7, 2018, https:// Land. searchengineland.com/how-bing-is-enhancing- search-and-apparently-growing-as-a-result-303331

newamerica.org/oti/reports/rising-through-ranks/ 50 98 Barrett, "I Used Only Bing for 3 Months. Here's 110 Patrick Stox, "90%+ of Bing search results would What I Found—And What I Didn't" be based on machine learning. LambdaMART is the core. @CoperniX #SMX," Twitter, January 30, 2019, 99 https://blogs.bing.com/webmaster/ https://twitter.com/patrickstox/status/ october-2018/bingbot-Series-Maximizing-Crawl- 1090682797306478592 Efficiency 111 Schwartz, "Google and Bing," Search Engine 100 Barrett, "I Used Only Bing for 3 Months. Here's Land. What I Found—And What I Didn't" 112 Schwartz, "Google and Bing," Search Engine 101 Jan Pedersen, "The Role of Content Quality in Land. Bing Ranking," Bing Blogs, entry posted December 9, 2014, https://blogs.bing.com/search-quality-insights/ 113 Schwartz, "Google and Bing," Search Engine 2014/12/08/the-role-of-content-quality-in-bing- Land. ranking 114 Schwartz, "Google and Bing," Search Engine 102 Pedersen, "The Role," Bing Blogs. Land.

103 Barrett, "I Used Only Bing for 3 Months. Here's 115 "Bing Webmaster Guidelines," Bing, https:// What I Found—And What I Didn't" www.bing.com/webmaster/help/webmaster- guidelines-30fba23a 104 Schwartz, "Google and Bing," Search Engine Land. 116 "Bing Webmaster Guidelines," Bing.

105 Matt McGee, "Yes, Bing Has Human Search 117 McGee, "Yes, Bing," Search Engine Land. Quality Raters & Here's How They Judge Web Pages," Search Engine Land, last modified August 15, 118 Schwartz, "Google and Bing," Search Engine 2012, https://searchengineland.com/bing-search- Land. quality-rating-guidelines-130592 119 Schwartz, "Google and Bing," Search Engine 106 Schwartz, "Google and Bing," Search Engine Land. Land. 120 Schwartz, "Google and Bing," Search Engine 107 Schwartz, "Google and Bing," Search Engine Land. Land. 121 Alan Sembera, "How to Recrawl a Website With 108 Jillian D'Onfro, "Trump is Slamming Google's Bing," AZ Central, https:// News Results But Here's How Microsoft's Bing Stacks yourbusiness.azcentral.com/recrawl-website- Up," CNBC, August 29, 2018, https://www.cnbc.com/ bing-26297.html 2018/08/29/how-google-and-bing-choose-news- stories-for-search.html 122 Bing, "Bing To Use Location for RTBF," Bing Blogs, entry posted August 12, 2016, https:// 109 Barry Schwartz, "Bing: Machine Learning Leads blogs.bing.com/search/august-2016/bing-to-use- To More Human Like Ranking In Search," Search location-for-rtbf Engine Roundtable, last modified December 12, 2018, https://www.seroundtable.com/bing-machine- learning-human-ranking-search-26810.html

newamerica.org/oti/reports/rising-through-ranks/ 51 123 Microsoft, Content Removal Requests Report, 132 David Nield, "It's Time to Switch to a Privacy https://www.microsoft.com/en-us/corporate- Browser," WIRED, June 16, 2019, https:// responsibility/content-removal-requests-report www.wired.com/story/privacy-browsers- duckduckgo-ghostery-brave/ 124 DuckDuckGo, "Open Source Overview," DuckDuckGo Help Pages, https:// 133 Becky Chao and Eric Null, Paying for Our help.duckduckgo.com/duckduckgo-help-pages/ Privacy: What Online Business Models Should Be Off- open-source/opensource-overview/ Limits?, September 17, 2019, https:// www.newamerica.org/oti/reports/paying-our- 125 Manish Agarwal and David K. Round, "The privacy-what-online-business-models-should-be- Emergence of Global Search Engines: Trends in limits/ History and Competition," Competition Policy International 7, no. 1 (Spring 2011): https:// 134 DuckDuckGo, "Privacy," DuckDuckGo, https:// www.competitionpolicyinternational.com/assets/ duckduckgo.com/privacy 0d358061e11f2708ad9d62634c6c40ad/Agarwal-with- Cover.pdf 135 Rob Pegoraro, "What It's Like To Use A Search Engine That's More Private Than Google," Yahoo! 126 "Search Engine," StatCounter. Finance, November 4, 2018, https:// finance.yahoo.com/news/like-use-search-engine- 127 Cory Hedgepeth, "DuckDuckGo Explodes With 1 thats-private-google-174413042.html Billion Monthly Searches (Um, Is This Really Happening?)," Direct Online Marketing, last modified 136 DuckDuckGo, "Results Rankings (SEO)," February 6, 2019, https://www.directom.com/ DuckDuckGo Help Pages, https:// duckduckgo-billion-monthly-searches/ help.duckduckgo.com/duckduckgo-help-pages/ results/rankings/ 128 "Duckduckgo.com Competitive Analysis, Marketing Mix and Traffic," Alexa Internet, https:// 137 Christopher Mims, "The Search Engine Backlash www.alexa.com/siteinfo/duckduckgo.com Against 'Content Mills,'" MIT Technology Review, July 26, 2010, https://www.technologyreview.com/s/ 129 Sam Hollingsworth, "DuckDuckGo vs. Google: 419965/the-search-engine-backlash-against-content- An In-Depth Search Engine Comparison," Search mills/ Engine Journal, last modified April 12, 2019, https:// www.searchenginejournal.com/google-vs- 138 Dorfman, "Why DuckDuckGo," Search Engine duckduckgo/301997/#close Land.

130 Lisa Lacy, "DuckDuckGo Is Shedding Its Black 139 Dan Noyes, "The Top 20 Valuable Facebook Sheep Status Thanks to Its Dedication to Privacy," Ad Statistics – Updated July 2019," Zephoria Digital Week, December 3, 2018, https://www.adweek.com/ Marketing, https://zephoria.com/top-15-valuable- digital/duckduckgo-is-shedding-its-black-sheep- facebook-statistics/ status-thanks-to-its-dedication-to-privacy/ 140 "Facebook.com Competitive Analysis, 131 Kevin Bankston and Ross Schulman, Case Study Marketing Mix and Traffic," Alexa Internet, https:// #1: Using Transit Encryption by Default, February www.alexa.com/siteinfo/facebook.com 2017, https://www.newamerica.org/in-depth/getting- internet-companies-do-right-thing/case-study-1- 141 Victor Luckerson, "Here's How Facebook's News using-transit-encryption-default/ Feed Actually Works," TIME, July 9, 2015,, https://

newamerica.org/oti/reports/rising-through-ranks/ 52 time.com/collection-post/3950525/facebook-news- facebook-tweaks-newsfeed-to-favor-content-from- feed-algorithm/ friends-family/

142 Michael Arrington, "Facebook Users Revolt, 149 Shannon Connellan, "Facebook Will Give You Facebook Replies," TechCrunch, September 6, 2006, More Info About Why Certain Posts Show Up In Your https://techcrunch.com/2006/09/06/facebook- News Feed," Mashable, March 31, 2019, https:// users-revolt-facebook-replies/https:// mashable.com/article/facebook-news-feed-posts- techcrunch.com/2006/09/06/facebook-users-revolt- update/ facebook-replies/ 150 Vogelstein, "Facebook Tweaks Newsfeed to 143 Luckerson, "Here's How Facebook's News Feed Favor Content from Friends, Family" Actually Works". 151 Facebook, "How News Feed Works," Publish 144 Amy Gesenhues, "Facebook Ad Revenue Tops News Feed, https://www.facebook.com/help/ $16.6 Billion, Driven by Instagram, Stories," Martech, publisher/718033381901819?helpref=faq_content last modified January 31, 2019, https:// martechtoday.com/despite-ongoing-criticism- 152 Vogelstein, "Facebook Tweaks Newsfeed to facebook-generates-16-6-billion-in-ad-revenue- Favor Content from Friends, Family" during-q4-up-30-yoy-230261 153 Abhinav Sharma, "Your Social Media News Feed 145 Josh Constine, "Zuckerberg Answers Big And The Algorithms That Drive It," Forbes, May 15, Questions About Facebook, Forced Downloads Of 2017, https://www.forbes.com/sites/quora/ Messenger, And Page Reach," TechCrunch, 2017/05/15/your-social-media-news-feed-and-the- November 6, 2014, https://techcrunch.com/ algorithms-that-drive-it/#5f1aacf04eb8 2014/11/06/facebook-faq/ 154 Connellan, "Facebook Will Give You More Info 146 Affinity score can be determined based on how About Why Certain Posts Show Up In Your News friendly one user is with another user. This is based Feed”. on the time a user spends interacting with or looking at another user’s profile. For example, the more time 155 Facebook, "How News," Publish News Feed. User A spends interacting with User B, the more 156 Sharma, "Your Social Media News Feed And The likely Facebook is to show User A profile updates Algorithms That Drive It". from User B. Edge weight can be understood as relative weight of the content a user sees. For 157 Sharma, "Your Social Media News Feed And The example, relationship status updates are weighted Algorithms That Drive It". highly, as this is considered an update that a user’s friends and network would generally be very 158 Explaining the News Feed Algorithm: An interested in. The time decay signal provides greater Analysis of the “News Feed FYI” Blog weight to new content compared to old content. 159 Shannon Tien, "How the Facebook Algorithm 147 Sarah Shirazyan, e-mail message to author, Works and How to Make it Work For You," Hootsuite, August 21, 2019. last modified April 25, 2018, https:// blog.hootsuite.com/facebook-algorithm/ 148 Fred Vogelstein, "Facebook Tweaks Newsfeed to Favor Content from Friends, Family," WIRED, 160 Tien, "How the Facebook," Hootsuite. January 11, 2018, https://www.wired.com/story/

newamerica.org/oti/reports/rising-through-ranks/ 53 161 Sean Si, "Facebook Is Updating Their News Feed Many," The Verge, February 7, 2019, https:// Ranking Algorithm," SEO Hacker, https://seo- www.theverge.com/2019/2/7/18213567/twitter-to- hacker.com/facebook-updating-news-feed-ranking- stop-sharing-mau-as-users-decline-q4-2018-earnings algorithm/ 172 "Twitter.com Competitive Analysis, Marketing 162 Si, "Facebook Is Updating," SEO Hacker. Mix and Traffic," Alexa Internet, https:// www.alexa.com/siteinfo/twitter.com 163 Sarah Perez, "Facebook News Feed Changes Downrank Misleading Health Info and Dangerous 173 Marty Swant, "Twitter Starts Using an Algorithm 'Cures,'" TechCrunch, July 2, 2019, https:// to Curate Users' Timelines," AdWeek, February 10, techcrunch.com/2019/07/02/facebook-news-feed- 2016, https://www.adweek.com/digital/twitter- changes-downrank-misleading-health-info-and- starts-using-algorithm-curate-users- dangerous-cures/ timelines-169556/

164 Salvador Rodriguez, "Facebook Is Taking A Page 174 Swant, "Twitter Starts Using an Algorithm to Out Of Google's Playbook To Stop Fake News From Curate Users’ Timelines". Going Viral," CNBC, April 10, 2019, https:// www.cnbc.com/2019/04/10/facebook-click-gap- 175 Katie Sehl, "How the Twitter Algorithm Works in google-like-approach-to-stop-fake-news-going- 2019 and How to Make it Work for You," Hootsuite viral.html Blog, entry posted February 20, 2019, https:// blog.hootsuite.com/twitter-algorithm/ 165 Si, "Facebook Is Updating," SEO Hacker. 176 Swant, "Twitter Starts Using an Algorithm to 166 Connellan, "Facebook Will Give You More Info Curate Users’ Timelines". About Why Certain Posts Show Up In Your News Feed”. 177 Sehl, "How the Twitter," Hootsuite Blog.

167 Connellan, "Facebook Will Give You More Info 178 Evan Niu, "Snap Is About to Embrace About Why Certain Posts Show Up In Your News Algorithmic Curation," The Motley Fool, last Feed”. modified November 9, 2017, https://www.fool.com/ investing/2017/11/09/snap-is-about-to-embrace- 168 Connellan, "Facebook Will Give You More Info algorithmic-curation.aspx About Why Certain Posts Show Up In Your News Feed”. 179 Will Oremus, "Twitter's New Order," Slate, March 5, 2017, http://www.slate.com/articles/ 169 Ramya Sethuraman, "Why Am I Seeing This? technology/cover_story/2017/03/ We Have an Answer for You," Facebook Newsroom, twitter_s_timeline_algorithm_and_its_effect_on_us_ last modified March 31, 2019, https:// explained.html newsroom.fb.com/news/2019/03/why-am-i-seeing- this/ 180 Niu, "Snap Is About to Embrace Algorithmic Curation," The Motley Fool. 170 Explaining the News Feed Algorithm: An Analysis of the “News Feed FYI” Blog 181 Aja Romano, "At Long Last, Twitter Brought Back Chronological Timelines. Here's Why They're So 171 Jacob Kastrenake, "Twitter Keeps Losing Beloved.," Vox, September 20, 2018, https:// Monthly Users, So It's Going To Stop Sharing How

newamerica.org/oti/reports/rising-through-ranks/ 54 www.vox.com/culture/2018/9/20/17876098/twitter- 197 Sehl, "How the Twitter," Hootsuite Blog. chronological-timeline-back-finally 198 Oremus, "Twitter's New Order". 182 Zeynep Tufekci, "Why Twitter Should Not Algorithmically Curate the Timeline," The Message, 199 Nicolas Koumchatzky and Anton Andryeyev, last modified September 4, 2014, https:// "Using Deep Learning at Scale in Twitter's Timelines," medium.com/message/the-algorithm-giveth-but-it- Twitter Blog, last modified May 9, 2017, https:// also-taketh-b7efad92bc1f blog.twitter.com/engineering/en_us/topics/insights/ 2017/using-deep-learning-at-scale-in-- 183 Tufekci, "Why Twitter Should Not timelines.html Algorithmically Curate the Timeline," The Message. 200 Oremus, "Twitter's New Order". 184 Oremus, "Twitter's New Order". 201 Swant, "Twitter Starts Using an Algorithm to 185 Oremus, "Twitter's New Order". Curate Users’ Timelines".

186 Sehl, "How the Twitter," Hootsuite Blog. 202 Sehl, "How the Twitter," Hootsuite Blog.

187 Swant, "Twitter Starts Using an Algorithm to 203 Koumchatzky and Andryeyev, "Using Deep," Curate Users’ Timelines". Twitter Blog.

188 Oremus, "Twitter's New Order". 204 Julia Carrie Wong, "Twitter Announces Global Change to Algorithm in Effort to Tackle Harassment," 189 Romano, "At Long Last Brought Back Guardian, May 15, 2018, https:// Chronological Timelines. Here's Why They're So www.theguardian.com/technology/2018/may/15/ Beloved.". twitter-ranking-algorithm-change-trolling- harassment-abuse 190 Swant, "Twitter Starts Using an Algorithm to Curate Users’ Timelines". 205 Twitter Support, "Never miss important Tweets from people you know," Twitter, September 17, 2018, 191 Sehl, "How the Twitter," Hootsuite Blog. 7:58 pm, https://twitter.com/TwitterSupport/status/ 1041838954008391680 192 Oremus, "Twitter's New Order". 206 Lauren Feiner, "Reddit Users Are The Least 193 Keith Coleman, "See What's Happening!," Valuable Of Any Social Network," CNBC, February 11, Twitter Blog, last modified June 13, 2018, https:// 2019, https://www.cnbc.com/2019/02/11/reddit- blog.twitter.com/official/en_us/topics/product/2018/ users-are-the-least-valuable-of-any-social- see_whats_happening.html network.html

194 Swant, "Twitter Starts Using an Algorithm to 207 Alexa, "reddit.com Competitive Analysis, Curate Users’ Timelines". Marketing Mix and Traffic," Alexa, https:// www.alexa.com/siteinfo/reddit.com 195 Oremus, "Twitter's New Order".

208 Emily van der Nagel, "'Networks That Work Too 196 Twitter, "About Your Twitter Timeline," Twitter Well': Intervening in Algorithmic Connections," Medi Help Center, https://help.twitter.com/en/using- a International Australia, Incorporating Culture &​ twitter/twitter-timeline Policy 168, no. 1 (August 2018).

newamerica.org/oti/reports/rising-through-ranks/ 55 209 James Grimmelmann, "The Virtues of 224 Massanari, "#Gamergate and The Fappening". Moderation," Yale Journal of Law and Technology 17, no. 1 (2015): https://digitalcommons.law.yale.edu/cgi/ 225 Massanari, "#Gamergate and The Fappening". viewcontent.cgi 226 Massanari, "#Gamergate and The Fappening". 210 Adrienne Massanari, "#Gamergate and The Fappening: How Reddit's Algorithm, Governance, and Culture Support Toxic Technocultures," New Media & Society 19, no. 3 (October 2015): https:// www.researchgate.net/publication/ 283848479_Gamergate_and_The_Fappening_How_R eddit's_algorithm_governance_and_culture_support _toxic_technocultures

211 Massanari, "#Gamergate and The Fappening".

212 Massanari, "#Gamergate and The Fappening".

213 Massanari, "#Gamergate and The Fappening".

214 Massanari, "#Gamergate and The Fappening".

215 Massanari, "#Gamergate and The Fappening".

216 Massanari, "#Gamergate and The Fappening".

217 Massanari, "#Gamergate and The Fappening".

218 Amir Salihefendic, "How Reddit Ranking Algorithms Work," Hacking and Gonzo, last modified December 8, 2015, https://medium.com/hacking- and-gonzo/how-reddit-ranking-alg orithms-work- ef111e33d0d9

219 Salihefendic, "How Reddit," Hacking and Gonzo.

220 Maria Glenski and Tim Weninger, Predicting User-Interactions on Reddit, July 1, 2017, https:// arxiv.org/pdf/1707.00195.pdf

221 Glenski and Weninger, Predicting User- Interactions on Reddit.

222 Bozdag, "Bias in Algorithmic Filtering and Personalization”.

223 Massanari, "#Gamergate and The Fappening".

newamerica.org/oti/reports/rising-through-ranks/ 56

This report carries a Creative Commons Attribution 4.0 International license, which permits re-use of New America content when proper attribution is provided. This means you are free to share and adapt New America’s work, or include our content in derivative works, under the following conditions:

• Attribution. You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

For the full legal code of this Creative Commons license, please visit creativecommons.org.

If you have any questions about citing or reusing New America content, please visit www.newamerica.org.

All photos in this report are supplied by, and licensed to, shutterstock.com unless otherwise stated. Photos from federal government sources are used under section 105 of the Copyright Act.

newamerica.org/oti/reports/rising-through-ranks/ 57