Auditing Wikipedia's Hyperlinks Network on Polarizing Topics
Total Page:16
File Type:pdf, Size:1020Kb
Auditing Wikipedia’s Hyperlinks Network on Polarizing Topics Cristina Menghini Aris Anagnostopoulos Eli Upfal DIAG DIAG Dept. of Computer Science Sapienza University Sapienza University Brown University Rome, Italy Rome, Italy Providence, RI, USA [email protected] [email protected] [email protected] ABSTRACT readers to deepen their understanding of a topic by conveniently ac- People eager to learn about a topic can access Wikipedia to form cessing other articles." Consequently, while reading an article, users a preliminary opinion. Despite the solid revision process behind are directly exposed to its content and indirectly exposed to the the encyclopedia’s articles, the users’ exploration process is still content of the pages it points to. influenced by the hyperlinks’ network. In this paper, we shed light Wikipedia’s pages are the result of collaborative efforts of a com- on this overlooked phenomenon by investigating how articles de- mitted community that, following policies and guidelines [4, 20], scribing complementary subjects of a topic interconnect, and thus generates and maintains up-to-date and high-quality content [28, may shape readers’ exposure to diverging content. To quantify 40]. Even though tools support the community for curating pages this, we introduce the exposure to diverse information, a metric that and adding links, it lacks a systematic way to contextualize the captures how users’ exposure to multiple subjects of a topic varies pages within the more general articles’ network. Indeed, it is im- click-after-click by leveraging navigation models. portant to stress that having access to high-quality pages does not For the experiments, we collected six topic-induced networks imply a comprehensive exposure to an argument, especially for a about polarizing topics and analyzed the extent to which their broader or polarizing topic. topologies induce readers to examine diverse content. More specif- Users differently use Wikipedia, according to their information ically, we take two sets of articles about opposing stances (e.g., needs. Singer et al. [45] show that users curious about a topic explore guns control and guns right) and measure the probability that users it by browsing the encyclopedia. In fact, they rely on hyperlinks to move within or across the sets, by simulating their behavior via find correlated or complementary content to the subject of interest. a Wikipedia-tailored model. Our findings show that the networks Therefore, it is crucial to evaluate the extent to which the current hinder users to symmetrically explore diverse content. Moreover, link structure encourages users to browse related topics to develop a on average, the probability that the networks nudge users to remain more comprehensive view and perspective of a subject. This theme in a knowledge bubble is up to an order of magnitude higher than becomes particularly important when users look for an overview that of exploring pages of contrasting subjects. Taken together, on polarizing topics spanning across multiple articles. those findings return a new and intriguing picture of Wikipedia’s Wikipedia’s Neutral Point of View (NPOV) encourages editors network structural influence on polarizing issues’ exploration. to work such that articles’ content fairly and proportionately repre- sents all the significant views that have been published by reliable KEYWORDS sources on the subject [51]. Although the NPOV document gathers many suggestions to properly curate the direct content of pages, it Wikipedia, Hyperlinks Network, Polarization, User Behavior does not refer to the impact links might have in determining users’ ACM Reference Format: exposure to indirect content. Cristina Menghini, Aris Anagnostopoulos, and Eli Upfal. 2021. Auditing Suppose we consider the topic abortion. It is a broad issue, which Wikipedia’s Hyperlinks Network on Polarizing Topics. In Proceeding of The distributes across multiple articles on Wikipedia. Moreover, due to Web Conference 2021, April 19–23, 2021, Ljubljana, Slovenia. ACM, New York, its polarizing nature, it is possible to recognize pages about events, NY, USA, 13 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn people, subjects or organizations that are associated either to pro- choice or pro-life standings. Users willing to learn about abortion arXiv:2007.08197v4 [cs.SI] 8 Mar 2021 1 INTRODUCTION might access the encyclopedia to collect information and then de- velop their idea. Consider a user that enters the network reading Knowledge on Wikipedia is distributed across articles inter-connected the article Abortion-rights movement that portrays and outlines via hyperlinks. According to Wikipedia’s Linking Manual [49], "In- campaigns supporting abortion. We assume that the article’s body ternal links can add to the cohesion and utility of Wikipedia, allowing does not endorse the page’s subject due to the NPOV principle. So, we expect that the user acquires objective knowledge about Permission to make digital or hard copies of all or part of this work for personal or organizations supporting abortion and, maybe, also realizes the classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation existence of anti-abortion movements. Now, imagine that the user on the first page. Copyrights for components of this work owned by others than ACM decides to continue her exploration of the topic, and to do it, she must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, follows the hyperlinks within the current page. If the linkage to to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. pages regarding subjects close to pro-life view is weak, our user has WWW ’21, April 19–23, 2021, Ljubljana, Slovenia little possibilities of collecting diverse views that contribute to the © 2021 Association for Computing Machinery. users’ development of a comprehensive perspective on the topic. It ACM ISBN 978-x-xxxx-xxxx-x/YY/MM...$15.00 https://doi.org/10.1145/nnnnnnn.nnnnnnn WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Cristina Menghini, Aris Anagnostopoulos, and Eli Upfal follows that the lack of sufficient linkage among pages expressing on their behavioral patterns [43, 45], features determining diverse stances of a topic can be against the NPOV’s goals. the success of wikilinks [16, 31], and readers’ clickstream To our best knowledge, there are no studies that investigate data [53]. the validity of the NPOV principles concerning users’ exposure to • We find that the structure of the network facilitates users the indirect content (i.e., the one suggested by hyperlinks). Hence, to explore knowledge bubbles of homogeneous view rather analyzing Wikipedia’s links is particularly important to understand than opposing stances. Moreover, we show that readers’ if broad topics, which conceptually span across multiple articles, interest is biased toward one side of the topic based on the are effectively, proportionately, and fairly presented to readers, not internal and external traffic on Wikipedia (see Sect. 4.1.1, 5 only in terms of direct content (i.e., article’s body). and 6). Previous works addressed the issue of users’ polarization on To our knowledge this is the first work that analyzes Wikipedia’s social networks and showed that it is hard for users to interact readers’ exposure to diverse information through the link network. with content created/shared by users of opposing views [1, 9, 10, Before moving on, we want to emphasize that this work does not 19, 24]. Ribeiro et al. [42] empirically showed that the YouTube claim how the hyperlinks network should be, rather we aim to recommender system contributes to radicalize users’ pathways. study if the current connections among articles encumber users in Given the nature and role of Wikipedia as a primary source of visiting complementary pages about a polarizing topic. Also, our knowledge acquisition, the lack of broad exposure to different views conclusions come from a network-based analysis. More advanced of a topic appears to be critical to guarantee fair and balanced access investigation combining network properties and articles’ content is to a well-rounded knowledge. left out for future works. The code to replicate the paper is stored This paper provides a first observational study on Wikipedia in an anonymous folder2. that aims to quantify how the hyperlinks’ network topology can profoundly affect user exposure to diverse stances on polarizing 2 RELATED WORKS topics. Having a comprehensive view of the connections among Wikipedia pages and how they shape reader exposure to informa- We divide this paper related work in four categories: Improving tion is a difficult task to grasp for humans. Therefore, it requires Wikipedia, Navigating Wikipedia, Wikipedia Categorization and introducing algorithmic methods to audit and quantify the mutual Polarization on Social Media. level of exposure among articles of diverse content, especially for Improving Wikipedia. The scientific community proposed semi- polarizing matters. That is fundamental for the improvement of the automated procedures to improve Wikipedia’s quality. These works, encyclopedia and its role in promoting a self-critical society. check the veracity of references [18, 41], suggest articles’ structure By studying the hyperlinks network, we first aim to discover [39], look for hoaxes [30] or, recommend links [38, 54]. Although to what extent the network’s topology pushes users to explore link recommendation tools enrich the editing process, they do not diverse content, rather than keep them within knowledge bubbles1. provide editors a measure to evaluate the relationship among ar- Secondly, we aim to gain insights that may help to design a system ticles containing diverse opinions. In this work, we define such supporting editors in (1) contextualizing pages within the more metrics, Sect. 4.2. general encyclopedia’s network and (2) adding links connecting Wikipedia Navigation. The literature still lacks a model that articles of opposing/complementary views.