Downloaded More Than 4000 Times
Total Page:16
File Type:pdf, Size:1020Kb
What did you see? Personalization, regionalization and the question of the filter bubble in Google’s search engine Tobias D. Kra Michael Gamer Katharina A. Zweig Algorithm Accountablitiy Lab Algorithm Accountablitiy Lab Algorithm Accountablitiy Lab Golieb-Daimler-Str. 48 Golieb-Daimler-Str. 48 Golieb-Daimler-Str. 48 TU Kaiserslautern TU Kaiserslautern TU Kaiserslautern 0000-0002-3527-1092 0000-0003-0261-0921 0000-0002-4294-9017 [email protected] [email protected] [email protected] ABSTRACT 1.1 e model of algorithmically generated is report analyzes the Google search results from more than 1,500 and amplied lter bubbles volunteer data donors who, in the ve weeks leading up to the e possibilities and dangers of a so-called algorithmically gener- federal election on September 24th, 2017, automatically searched ated lter bubble increase steadily. is term is to be understood Google for 16 predened names of political parties and politicians as a partial concept of the lter bubble theory by Eli Pariser. In his every four hours. It is based on an adjusted database consisting 2011 book, ”e Filter Bubble: What the Internet Is Hiding from of more than 8,000,000 data records, which were generated in the You” (Pariser, 2011), the internet activist pointed out the possible context of the research project ”#Datenspende: Google und die Bun- dangers of so-called lter bubbles. In his TED talk and on the basis destagswahl 2017” and sent to us for evaluation. e #Datenspende of two screenshots from 2011, he showed that two of his friends had project was commissioned by six state media authorities. Spiegel received signicantly dierent results when searching for ”Egypt” Online acted as a media partner. Focal points of the present study on the online search platform Google1. From this he developed a are i. a. the question of the degree of personalization of search theory according to which personalized algorithms in social media results, the proportion of regionalization and the risk of algorithm- tend to present individuals with content that matches the user’s based lter bubble formation or reinforcement by the leader in the previous views, leading to the emergence of dierent information search engine market. spheres where dierent contents or opinions prevail. ACM Reference format: In short, ltering the information ow individually can result in Tobias D. Kra, Michael Gamer, and Katharina A. Zweig. 2016. What did groups or individuals being oered dierent facts, thus living in 2 you see? a unique informational universe . is is especially worrying if Personalization, regionalization and the question of the lter bubble in the respective content is politically extreme in nature and if a one- Google’s search engine. In Proceedings of ACM Conference, Washington, DC, sided perspective results in impairment or total deterioration of USA, July 2017 (Conference’17), 23 pages. citizens’ discursive capabilities. A lter bubble in this sense is DOI: 10.1145/nnnnnnn.nnnnnnn a selection of news that corresponds to one’s own perspectives, which could potentially lead to solidication of one’s own position in the political sphere3. 1 INTRODUCTION: THE FILTER BUBBLE’S 1.1.1 Filtering search results with personalization and regional- DANGERS TO SOCIETY ization. Since the number of web pages associated with a search Political formation of opinion as well as general access to informa- term is more than 10 for most queries and at the same time the tion has changed signicantly due to digitalization. Information rst 10 web pages shown receive the greatest aention from users, sources such as newspapers, TV and radio, where large parts of arXiv:1812.10943v1 [cs.CY] 28 Dec 2018 it is essential that search engines lter the possible search results. the population read, heard, or saw the same news and interpreta- Certainly one of the most important lters is the user’s language, tions of these news stories, are being replaced by more and more while topicality and popularity play an additional role as well as, diverse and personalized media oerings as a result of digital trans- to a lesser extent, embedment in the entire WWW (e.g. measured formation. Personalization is enabled by algorithm-based systems: here, an algorithm – not a human – decides which contents users 1See Eli Pariser’s TED talk ”Beware the lter bubbles”, might be interested in, and only these are oered to them in social hps://www.ted.com/talks/eli pariser beware online lter bubbles (accessed May 12th 2018) networks. e same is true for personalized search engines like 2”a unique universe of information for each of us” (Pariser, 2011, p. 9). Google, Yahoo or Bing, for which, at more than 3,200 billion search 3”Filter bubbles” can be understood as even more advanced concepts. Selecting websites by means of state cencorship can also create lter bubbles by restricting information. queries in 2016 (Statista, 2016), it is impossible to have a man-made While this cencorship is supported by algorithms, this does not constitute a lter bubble sorting available. through algorithm-based personalization that is hypothesized in this text. is kind of restriction of information and its possible consequences for lter bubble formation aren’t examined here. Aer all, a search engine operator or a social media platform Conference’17, Washington, DC, USA could knowingly and intentionally limit the data base in a certain direction, thus 2016. 978-x-xxxx-xxxx-x/YY/MM...$15.00 presenting all users with the same content while only oering a selective extract of DOI: 10.1145/nnnnnnn.nnnnnnn reality. is option is not examined here. Conference’17, July 2017, Washington, DC, USA Krat, Gamer and Zweig by the PageRank). (4) Isolation from other sources of information: e groups A particularly important ltering mechanism within the framework of people whose respective news situation displays homo- of Eli Pariser’s lter bubble theory is ”personalization”. Regarding geneous, politically charged and one-sided perspectives, the term of (preselected) personalization, we follow the state- rarely use other sources of information or only those which ments given by Zuidserveen Borgesius et al. (2016), according to place them in extremely similar lter bubbles. which personalization allows the selection of content that has not e stronger those four mechanisms manifest themselves, the yet been clicked by the user, but which is associated with users stronger the lter bubble eect grows, including its harmful conse- with similar interests. Algorithmically speaking, this is based on quences for society. e degree of personalization is essential, as so-called ”recommendation systems”, which determine the inter- politically relevant lter bubbles do not emerge if personalization ests of a currently searching user from other people who have of an algorithm responsible for selecting news is low. Higher or shown similar click behavior in the past. It is also also plausible high personalization and veriable lter bubbles do not necessarily that according to their own click behavior and together with known take political eect if either their contents aren’t political in nature categorizations of clicked content a prole is compiled for each or users make use of other sources of information as well. For person, saying, for instance: ”is person prefers news about sports instance, information delivered to citizens of dierent languages and business, reads medium-length text and news that are not older are free of overlap by denition, if the results are displayed in those than a day.” (Weare, 2009). Considering the vast number of users, languages – regardless, content-wise those citizens aren’t in any both methods can only be achieved algorithmically, using dierent way embedded in lter bubbles. modes of machine learning and thus only form statistical models (Zweig et al., 2017). Generally it can be expected that users logged 1.2 Examining Eli Pariser’s lter bubble theory into their Google accounts will tend to receive more personalized As algorithms are capable of controlling the ow of information search results. directed towards users, they are assigned a gatekeeper role simi- Websites from the search result list which have previously been lar to journalists in traditional journalism (see Moe & Syvertsen, clicked by a user are delimited from personalization. For outsiders, 2007). As a result, it is necessary to examine how powerful the those might not be dierentiable from the entries that have been algorithmically generated and hardened lter bubbles on various personalized by algorithms, but regarding content they don’t con- intermediaries and search engines actually are. e number of tribute to algorithm-based lter bubble formation or amplication, reliable studies is relatively low: an important German study by the since users have chosen those contents before. Hans Bredow institute oers a positive answer to the question of We use the term regionalization for a sample of websites for a the informational mix: sources of information today are diverse and whole group of persons who currently search from a specic re- capable of pervading other news and information of algorithmically gion or are known to originate from a specic region, yet don’t generated and hardened lter bubbles (Schmidt et al., 2017). It is necessarily mention a region in their search request. For instance, pointed out that algorithms oer a possibility to burst open lter the current location can roughly be derived from the searching bubbles if such a functionality is explicitly implemented. device’s IP address, or more accurately from smartphone location To our knowledge, apart from anecdotal examinations, a quanti- information or from the prole known to the search engine (Teevan tative evaluation of the degree of personalization for a larger user et al., 2011). e delivered websites themselves clearly relate to base hasn’t been deducted up until 2017: for example, in the context the location of interest specied by Google; which can be the case, of a Slate article Jacob Weisberg asked ve persons to search for for example, if a nearby location’s name appears on the website topics and found results to be very similar (Weisberg, 2011).