Filter Bubble”
Total Page:16
File Type:pdf, Size:1020Kb
Purdue University Purdue e-Pubs Charleston Library Conference Relevancy Redacted: Web-Scale Discovery and the “Filter Bubble” Corey Davis Royal Roads University, [email protected] Follow this and additional works at: https://docs.lib.purdue.edu/charleston An indexed, print copy of the Proceedings is also available for purchase at: http://www.thepress.purdue.edu/series/charleston. You may also be interested in the new series, Charleston Insights in Library, Archival, and Information Sciences. Find out more at: http://www.thepress.purdue.edu/series/charleston-insights-library-archival- and-information-sciences. Corey Davis, "Relevancy Redacted: Web-Scale Discovery and the “Filter Bubble”" (2011). Proceedings of the Charleston Library Conference. http://dx.doi.org/10.5703/1288284314965 This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information. Relevancy Redacted: Web-Scale Discovery and the “Filter Bubble” Corey Davis, Technical Services Librarian, Royal Roads University Library Abstract: Web-scale discovery has arrived. With products like Summon and WorldCat Local, hundreds of millions of articles and books are accessible at lightning speed from a single search box via the library. But there's a catch. As the size of the index grows, so too does the challenge of relevancy. When Google launched in 1998 with an index of only 25 million pages, its patented PageRank algorithm was powerful enough to provide outstanding results. But the web has grown to well over a trillion pages, and Google now employs over 200 different signals to determine what search results you see. According to Eli Pariser, author of "The filter bubble: what the internet is hiding from you" (Penguin, 2011), a growing number of these signals are based on what Google knows about you, especially your web history; and, according to Pariser, serving up information that's "pleasant and familiar and confirms your be- liefs" is becoming increasingly synonymous with relevancy. This session will critique Pariser's concept of the 'filter bubble' in terms of collection development and the possible evolutions of discovery layers like Summon and WorldCat Local, and the challenge of providing relevant academic research results in a web-scale world where stu- dents increasingly expect the kind of personalization sometimes at odds with academia's adherence to privacy and intellectual freedom. The following is a critique of Eli Pariser’s The filter consequence: We get trapped in a "filter bub- bubble: What the Internet is hiding from you ble" and don't get exposed to information that (2011), and attempts to capture the conversational could challenge or broaden our worldview. nature of the presentation as given at the Charles- (TED, 2011) ton Conference in 2011. In 2009, Google started tailoring search results for all Not that long ago, when different people searched users—whether signed into their Google Accounts or Google, if they used the same search terms, they not—based on their previous activities on the web. A got the exact same search results. Not anymore. wide range of websites increasingly use similar algo- When you search Google (“Technology overview”, rithms to guess what information a user wants based n.d.), over 200 signals determine relevancy, includ- on what they know about that user, such as their ing location, and—if you’re logged into your Google location, previous click behavior, and search history. account, or you allow your browser to accept cook- This kind of personalization is fairly obvious at web- ies from Google—previous search history. Two sites like Amazon and Netflix, but it can be much people can get two totally different results sets more subtle on sites like Google and Facebook. The based on the same key words, and increasingly, net result, however, is the same. Websites present these results are determined—at least in part—on only the information which is, in a way, similar to what you’ve searched for and clicked on before. information previously consumed by a user. Accord- The rewards for this kind of personalization are bet- ing to Pariser, people in the filter bubble are not as ter relevancy, but the challenges are that we will exposed to contradictory perspectives and can, as a increasingly see results based on what we’ve looked result, become intellectually isolated in ways that at before, creating a kind of ‘filter bubble’ that iso- threaten their ability to meaningfully take part in a lates us from resources we might not otherwise see. society full of uncomfortable truths. The concept of a filter bubble is fairly straightfor- ward as presented by Eli Pariser in his TED talk Be- But what does this all have to do with Libraries and ware online ‘filter bubbles’, which has received their collections? I work at Royal Roads University around a million views on the TED talks website. (RRU) in Victoria, British Columbia. We have about 2000 full-time equivalent (FTE) students but we’re As web companies strive to tailor their services pursuing growth aggressively, particularly in the (including news and search results) to our per- realm of international undergraduate students, sonal tastes, there's a dangerous unintended and we expect our numbers to rise significantly in 556 Charleston Conference Proceedings 2011 Copyright of this contribution remains in the name of the author(s). DOI: http://dx.doi.org/10.5703/1288284314965 the coming years. Right now we focus mostly on our resources, Google’s main search site has re- graduate programs at the Master’s level, delivered mained remarkably stable. mostly online via Moodle. RRU was established by the provincial government in 1995, and took up In 1998 Google co-founders Larry Page and Sergey quarters in an old military college. We have a rela- Brin published an article called The anatomy of a tively small print collection, with the focus being large-scale hypertextual web search engine (Brin & on our collection of ebooks and article databases. Page, 1998). At this time, ‘human curation’ was a So while we have more and more students spend- major way that companies like Yahoo! helped peo- ing time with us on campus, people mostly access ple access information on the web, through the our collections online. creation of directories and other lists. Page and Brin recognized that the web was growing too fast for The Internet Archive’s Wayback Machine this kind of organization to continue in a sustainable http://www.archive.org/web/web.php is a wonder- manor. Human editors simply couldn’t keep up. In ful resource. I can use it to see what our library 1999, according to Danny Sullivan (2008) at Search website http://library.royalroads.ca looked like over Engine Land, a majority of major search engines a decade ago. I can see that we linked to our local were still presenting human-powered results. But catalog and an alphabetical list of article and re- these lists were expensive to create and they didn’t search databases, with a rudimentary attempt to scale to what the web was becoming. Lists and di- classify them by program or subject. By 2006 we rectories also didn’t deal well with obscure topics: started organizing things a little differently and “Human maintained lists cover popular topics effec- placed a catalog search box on the homepage, but tively but are subjective, expensive to build and we still were experiencing the basic problem of in- maintain, slow to improve, and cannot cover all formation silos, where that multiple databases con- esoteric topics” (Brin & Page, 1998, p. 107). taining high-quality and highly sought-after content Google’s creators knew humans could not organize dispersed across dozens and dozens of different the web effectively as it scaled, and that this organi- systems and platforms, with no way to effectively zation had to be automated. One of the biggest search across them all. For books and video, a user challenges they faced was the unstructured nature had to search the catalogue. For articles, they of web documents, in contrast to the kind of data needed to choose one of many article databases. To libraries were dealing with, such as MARC records. help with this, we started creating subject guides. The web was a jumble of different shapes and sizes, We also did our part during information literacy not a structured catalog of information based on instruction to help make sense of this complex in- well-established metadata standards. “The web is a formation environment. vast collection of completely uncontrolled hetero- geneous documents” (Brin & Page, 1998, p. 111). We were suffering from three main issues, as iden- tified by Burke (2010): At this point in the history of the web, Google’s basic key to its rapid success was the PageRank al- 1. No clear and compelling place to start re- gorithm, which was powered by the nature of hy- search. perlinks, rather that primarily by the occurrence of 2. No easy way to identify appropriate library keywords in a particular document. resources. 3. A lack of awareness of library resources. PageRank relies on the uniquely democratic na- ture of the web by using its vast link structure Try using the Wayback Machine to look at Google’s as an indicator of an individual page’s value. In homepage in 1998, the year the company was essence, Google interprets a link from page A to founded. There are no lists. The interface is easy page B as a vote, by page A, for page B. But, and intuitive to use. These are the same qualities Google looks at considerably more than the that draw us to Google today. While our library has sheer volume of votes, or links a page receives; gone through three or four major website revamps for example, it also analyzes the page that casts in an attempt to help our users navigate and use the vote.