Rising Through the Ranks

October 2019 Rising Through the Ranks How Algorithms Rank and Curate Content in Search Results and on News Feeds Spandana Singh Last edited on October 21, 2019 at 9:47 a.m. EDT Acknowledgments In addition to the many stakeholders across civil society and the industry that have taken the time to talk to us about search engine and news feed ranking over the past few months, we would particularly like to thank Dr. Nathalie Maréchal from Ranking Digital Rights for her help in drafting this report. We would also like to thank Craig Newmark Philanthropies for its generous support of our work in this area. newamerica.org/oti/reports/rising-through-ranks/ 2 About the Author(s) Spandana Singh is a policy program associate in New America's Open Technology Institute. About New America We are dedicated to renewing America by continuing the quest to realize our nation’s highest ideals, honestly confronting the challenges caused by rapid technological and social change, and seizing the opportunities those changes create. About Open Technology Institute OTI works at the intersection of technology and policy to ensure that every community has equitable access to digital technology and its benefits. We promote universal access to communications technologies that are both open and secure, using a multidisciplinary approach that brings together advocates, researchers, organizers, and innovators. newamerica.org/oti/reports/rising-through-ranks/ 3 Contents Introduction 5 Search Ranking 8 Case Study: Google 8 Case Study: Bing 20 Case Study: DuckDuckGo 24 News Feed Ranking 27 Case Study: Facebook 27 Case Study: Twitter 33 Case Study: Reddit 37 Promoting Fairness, Accountability, and Transparency Around Algorithmic Curation and Ranking Practices 42 newamerica.org/oti/reports/rising-through-ranks/ 4 Introduction Since the early 2000s, the amount of information on the internet has grown tremendously. Whether it be news outlets, social media, or e-commerce platforms, the online ecosystem has become a go-to destination for users seeking a variety of information and experiences. At the same time, users have faced a fundamental challenge in identifying credible sources and understanding which of them to use. In order to help users access high-quality, relevant, and accurate information and content, a number of internet platforms rely on proprietary algorithmic tools to curate and rank content for users. This report focuses on search engines and platforms that ofer news feeds, both of which deploy algorithms to identify and curate content for users. Many of these platforms use hundreds of signals to inform these ranking algorithms and deliver users personalized search and news feed experiences. These algorithms control both the inputs and outputs of the information environment. They evaluate and process incoming information to identify what content is most relevant for users. They then determine which of these outputs a user should see and rank the outputs in a hierarchical manner. In this way, these platforms act as gatekeepers of online speech by exercising significant editorial judgment over information flows.1 Most internet platforms have heralded the introduction of personalized search results and news feeds as a positive—and now integral—feature of their services. Many platforms assert that personalization enables users to access and engage with content that is more relevant and meaningful to them. Personalization features also enable platforms to achieve significant growth and boost revenue through avenues such as advertising. However, there is a fundamental lack of transparency around how algorithmic decision-making around curation and ranking takes place. Because these practices can have a variety of negative consequences, this is concerning. In fact, many users are often unaware that such algorithms are being used by platforms to shape their online experiences.2 Rather, these users believe that the subjective frame presented by curation and ranking algorithms is representative of reality. As a result, users have grown accustomed to outsourcing judgment, autonomy,3 and decision-making to internet platforms and their opaque algorithms, who decide—based on their perceptions of what user interests and values are—what users’ online experience should be.4 This disparity in algorithmic awareness and understanding is fostering a new digital divide between individuals who are aware of and understand the impacts of algorithmic decision-making, and those who do not.5 newamerica.org/oti/reports/rising-through-ranks/ 5 As algorithmic tools are increasingly used to curate and rank content on internet platforms, concerns around fairness, accountability, and transparency have grown. In particular, an increasing number of researchers have noted that users lack awareness of algorithmic decision-making practices. In addition, there is a significant lack of transparency from internet platforms regarding how these tools are developed and deployed, and how they shape the user experience. Further, researchers have outlined that users and content creators often lack meaningful controls over, and agency related to, algorithmic decision-making practices. In addition, although these algorithmic curation and ranking tools remove the need for humans to make significant manual and individual decisions regarding millions of pieces of content, they do not remove the need for human editorial judgment in this process, nor do they reduce bias. The term “bias” in this context does not solely refer to inappropriate preferences based on protected categories like race or political affiliation. Rather, these tools compile insights into a broad range of weighted signals. Ranking algorithms then analyze the data to prioritize certain forms of content and certain voices over others. These algorithms also incorporate the judgments of the engineers who have developed them, particularly with regard to what information users are likely to find interesting and meaningful. In addition, algorithms can infer correlations in data that may reflect societal biases. Indeed, often times, these algorithms result from machine learning “black box” systems. This means that even though developers may know what the inputs and outputs of an algorithm are, they may not know exactly how the algorithm operates internally. Therefore, concerns regarding algorithmic bias and accountability have grown as these algorithmic decision-making practices have become more prevalent. This report is the second in a series of four reports that will explore how automated tools are being used by major technology companies to shape the content we see and engage with online, and how internet platforms, policymakers, and researchers can promote greater fairness, accountability, and transparency around these algorithmic decision-making practices. This report focuses on the algorithmic curation and ranking of content in search engine results and in news feeds on internet platforms. It uses case studies on three search engines—Google, Bing, and DuckDuckGo—and on three internet platforms that feature news feeds—Facebook, Twitter, and Reddit—to highlight the different ways algorithmic tools can be deployed by technology companies to curate and rank content, and the challenges associated with these practices. Editorial disclosure: This report discusses policies by Google, Microsoft, and Facebook, all of which are funders of work at New America but did not contribute funds directly to the research or writing of this report. New America is guided by the principles of full transparency, independence, and accessibility in all its activities and partnerships. New America does not engage in research or educational activities directed or newamerica.org/oti/reports/rising-through-ranks/ 6 infuenced in any way by financial supporters. View our full list of donors at www.newamerica.org/our-funding. newamerica.org/oti/reports/rising-through-ranks/ 7 Search Ranking Search engines have emerged as an essential tool as the internet has expanded, supporting key aspects of free expression online through information access and information dissemination. Search engines enable users to more effectively sift through and access endless amounts of information, and they have empowered individuals, businesses, and publishers to disseminate information. Although the process of conducting a search on a search engine seems straightforward, the ways algorithms curate and rank search results raise a number of concerns. These algorithms underpin the immense power of search engines, which are able to determine what information a user sees, the type of results they can access, and which publishers and pieces of information a user engages with first. In this way, search engines play a significant role in shaping every one of their user’s perspectives and mindsets.6 Additionally, not all search engines operate in the same manner. Therefore, which search engine a person uses will also infuence their viewpoints and opinions.7 As Wired’s Brian Barrett wrote, “The internet is a window on the world; a search engine warps and tints it.”8 The topic of algorithmic curation and ranking of search results has become especially prominent in the news recently, as conservative politicians in the United States have claimed that search engines, such as Google, and internet platforms, such as Facebook and Twitter, have instituted a liberal bias within their search results and news feed curation practices.9 However, there is little evidence that such bias is actually present.10 Technology companies

Rising Through the Ranks

Personalized Search

In Search of the Climate Change Filter Bubble

Arxiv:1811.12349V2 [Cs.SI] 4 Dec 2018 Content for Different Purposes in Very Large Scale

Convergence of Search, Recommendations and Advertising

Measuring Personalization of Web Search

Ethical Implications of Filter Bubbles and Personalized News-Streams

Repriv: Re-Envisioning In-Browser Privacy

Search Engine Bias and the Demise of Search Engine Utopianism

Cohort Modeling for Enhanced Personalized Search Jinyun Yan Wei Chu Ryen W

Exploring the Effect of Search Engine Personalization on Politically Biased Search Results

Measuring Political Personalization of Google News Search

Measuring Personalization of Web Search