Characterizing the Triggering Phenomenon in Wikipedia
Total Page:16
File Type:pdf, Size:1020Kb
Characterizing the Triggering Phenomenon in Wikipedia Anamika Chhabra S. R. Sudarshan Iyengar Indian Institute of Technology Ropar Indian Institute of Technology Ropar Punjab, India Punjab, India [email protected] [email protected] ABSTRACT beginning, which are then read by other users triggering them to Collaborative knowledge building achieves better results than in- add the connected factoids and so on [9], where triggering is a pro- dividual knowledge building essentially due to the triggering phe- cedure by which an idea or a comment spearheads the generation nomenon taking place among the users in a collaborative setting. of another idea or thought [21]. In Wikipedia, the existing content Although the literature points to a few theories supporting the of the articles triggers the users to contribute more content leading existence of this phenomenon, yet these theories have never been to the evolution of the articles through subsequent revisions. validated in real collaborative environments, thus questioning their general prevalence. In this work, we provide a mechanized way to observe the presence of triggering in knowledge building envi- ronments. We implement the method on the most-edited articles of Wikipedia and show that it may help in discerning how the existing knowledge leads to the inclusion of more knowledge in these articles. The proposed technique may further be used in other collaborative knowledge building settings as well. The insights ob- tained from the study will help the portal designers in building portals enabling optimal triggering. CCS CONCEPTS • Human-centered computing → Ethnographic studies; Wikis; Empirical studies in collaborative and social computing; KEYWORDS Figure 1: Triggering Network: The nodes represent the con- cepts and a link between two nodes shows that the two con- Triggering, Wikipedia, Google distance, Word association, Factoids, cepts are related to each other. The thickness of the edges Knowledge building, Evolution represents the strength of association between the concepts. ACM Reference Format: Anamika Chhabra and S. R. Sudarshan Iyengar. 2018. Characterizing the Literature shows that capturing the evolution of a knowledge Triggering Phenomenon in Wikipedia. In OpenSym ’18: The 14th Interna- artifact in general and understanding the triggering among the fac- tional Symposium on Open Collaboration, August 22–24, 2018, Paris, France. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3233391.3233535 toids has attracted researchers even before any portal like Wikipedia came into existence [12, 17, 20, 24]. For instance, classical theories 1 INTRODUCTION suggest that in a social system such as a collaborative knowledge building system, people get triggered to add more content due to the Understanding how knowledge evolves on Wikipedia has been of cognitive conflicts [17] or perturbations [24]. These conflicts arise great interest to researchers ever since Wikipedia became known as when they see content that is not complete or does not match with a successful medium for collaborative knowledge building [2, 23]. It what is there in their cognitive systems (i.e., minds) already. The ex- builds knowledge with a combined effort of a large group of users isting research also points to theories that support the prevalence of through successive refinements made on its articles [22]. Therefore, an underlying network among the pieces of knowledge concerning the content available in any given article does not reach its even- a knowledge artifact. For example, it is perceived that knowledge is tual state in a single step. Rather, a few factoids1 get added in the organized into frames and each of these frames possesses a particu- 1A factoid may refer to a standalone piece of information about the topic of the article. lar concept [20]. These frames may be of varying sizes, and those that are related to each other are linked together in the network Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed [21]. Therefore, when a frame is triggered, the other frames that are for profit or commercial advantage and that copies bear this notice and the full citation linked to it are also likely to be triggered [12, p. 55]. These frames on the first page. Copyrights for components of this work owned by others than ACM may be linked sequentially or in any non-linear fashion. One can must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a imagine a forest of nodes where each node is a knowledge frame, fee. Request permissions from [email protected]. and the attached frames form the connected components in the OpenSym ’18, August 22–24, 2018, Paris, France forest. Figure 1 shows an example of the underlying network of © 2018 Association for Computing Machinery. ACM ISBN 978-1-4503-5936-8/18/08...$15.00 the knowledge frames (concepts) which are shown as nodes and https://doi.org/10.1145/3233391.3233535 a link between two frames depicts that they are associated with OpenSym ’18, August 22–24, 2018, Paris, France Anamika Chhabra and S. R. Sudarshan Iyengar each other. Further, these frames are connected by condition-action of visiting an edge to be directly proportional to the weight of the rules, that determine which frames to trigger next [12]. When the edge. They showed that the rate of increase of innovations through triggering conditions for a frame are met, that frame is brought their model follows Heaps law, which has been shown to exist for into the system. Figure 1 captures this by the thickness of the edges such settings by past literature. In another work, the knowledge that represents the strength of association between the nodes. As networks of questions and answers were studied by Miroslav et an instance, the concept ‘A’ is associated with four more concepts, al. [1]. Using the concept of triggering from the classical cognitive namely ‘B’, ‘C’, ‘D’ and ‘E’, where A’s association with ‘B’ is more theories, Chhabra et al. [4] developed a mathematical model that than that with ‘C’ and ‘E’, which is further more than that with ‘D’. computes the knowledge produced in a system due to the effect Hence when ‘A’ gets introduced, the chances of inclusion of ‘B’, ‘C’, of triggering. The model uses the concept of diversity in activity ‘D’ and ‘E’ also increase based on the strength of their edges. This selection behavior of users in a collaborative environment [5, 6]. phenomenon leads to a ubiquitous and self-regulating phenomenon All these models have mainly focused on either finding the rate of the existing knowledge frames leading to the inclusion of more at which the inventions occur or some other statistical property of knowledge into the system, making it an autopoietic system.2 the process rather than understanding the evolution of knowledge. Researchers have worked on developing models that mimic the To the best of our knowledge, studies analyzing the triggering triggering phenomenon such as Polya’s Urn Model [11] and its ex- phenomenon at the level of tracking the factoids using real-world tensions [18, 19]. These models have focused mainly on the growth data have not been conducted so far. properties of knowledge in systems where triggering among knowl- edge frames takes place. However, the existing knowledge frames 3 DATASET steering the inclusion of others has not been explored with real- The data set3 that we used for the analysis contains the entire revi- world data [3]. In this paper, we shed light on the dynamics of sion history of the top 100 most edited articles on Wikipedia. The evolution of Wikipedia articles through a technique that we pro- rationale behind choosing the most-edited articles is that the phe- pose to automatically capture the triggering phenomenon among nomenon of triggering may be better understood by analyzing the important factoids of these articles. We further show in-depth obser- articles that have accumulated a large number of edits as compared vations made on a few most edited articles which provide evidence to the articles with a comparatively smaller revision history. The towards the process of existing content leading to the inclusion data is in XML format and contains details such as username or of new content. The development of mechanized ways to observe IP address (if the user was anonymous), user Id, revision Id, the the evolution of a collaborative piece of knowledge may pave the entire content of the article after the edit that lead to that particular way for advancements in the fundamental research on knowledge revision, timestamp of the revision, the article size in bytes etc. We building. This will further lead to better mechanism design of the specifically did not consider ‘list’ articles which mostly contain collaborative tools for building knowledge. links to other Wikipedia articles on some topic, such as ‘List of Pro- grams Broadcast by GMA Network’ and ‘List of Impact Wrestling 2 RELATED WORK Personnel’. This is due to the fact that these articles are built in A limited number of studies have been conducted in the recent past a slightly different manner as compared to the rest of the articles attempting to model how the existing knowledge sets the stage where the editing happens at word-level. The data set contains for the manifestation of more knowledge. Tria et al. [26] present a articles on a wide variety of topics ranging from people such as mathematical model to emulate the occurrence of a new invention. ‘George W. Bush’, ‘Britney Spears’, ‘Beyonce’ to countries such as The model is a generalization of the Polya’s Urn model [18].