Analyzing Requests for Permissions of Bots in Wikidata

Approving Automation: Analyzing Requests for Permissions of Bots in Wikidata Mariam Farda-Sarbas Hong Zhu Human-Centered Computing | Freie Universität Berlin Human-Centered Computing | Freie Universität Berlin [email protected] [email protected] Marisa Frizzi Nest Claudia Müller-Birn Human-Centered Computing | Freie Universität Berlin Human-Centered Computing | Freie Universität Berlin [email protected] [email protected] ABSTRACT 1 INTRODUCTION Wikidata, initially developed to serve as a central structured knowl- Wikidata, the sister project of Wikipedia, was launched in October edge base for Wikipedia, is now a melting point for structured data 2012. Similar to Wikipedia, Wikidata is a peer production commu- for companies, research projects and other peer production com- nity with more than 20 K active users1. It serves as the primary munities. Wikidata’s community consists of humans and bots, and source for structured data of Wikimedia projects, especially for most edits in Wikidata come from these bots. Prior research has Wikipedia. Wikidata is also currently being used by commercial raised concerns regarding the challenges for editors to ensure the services, such as Apple’s assistant Siri, in research projects, such as quality of bot-generated data, such as the lack of quality control WikiGenomes2 and other peer-production communities, such as and knowledge diversity. In this research work, we provide one way OpenStreetMap3. Thus, the quality of data in Wikidata is crucial, of tackling these challenges by taking a closer look at the approval not only for Wikidata itself but also for the projects which rely on process of bot activity on Wikidata. We collected all bot requests, Wikidata as their source of data. i.e. requests for permissions (RfP) from October 2012 to July 2018. However, early research on Wikidata’s community has already We analyzed these 683 bot requests by classifying them regarding shown that in addition to an active human editor community, bots activity focus, activity type, and source mentioned. Our results are responsible for the majority of edits (e.g., [10, 16, 17]). Contrary show that the majority of task requests deal with data additions to to Wikipedia, bots were massively used in the Wikidata community Wikidata from internal sources, especially from Wikipedia. How- from the beginning of the project; therefore, it seems they exhibit ever, we can also show the existing diversity of external sources an essential duty in Wikidata. Bots have been studied for some time used so far. Furthermore, we examined the reasons which caused in the context of Wikipedia (e.g. [3, 5, 9]), however, their role in the unsuccessful closing of RfPs. In some cases, the Wikidata com- the context of the Wikidata community is less explored. Concerns munity is reluctant to implement specific bots, even if they are were raised recently about the role of bots in Wikidata. The quality urgently needed because there is still no agreement in the commu- of bot-generated data, for example, and their influence on quality nity regarding the technical implementation. This study can serve control and knowledge diversity has been questioned [17]. as a foundation for studies that connect the approved tasks with This motivation led to the research presented in this paper, which the editing behavior of bots on Wikidata to understand the role of is guided by the question: What type of bot activities are approved bots better for quality control and knowledge diversity. by the Wikidata community? We are interested in understanding what type of tasks the Wikidata community permits to be performed KEYWORDS through bots, i.e. to automate. Wikidata, Bot, Task Approval As opposed to previous work [10, 16, 20] or [18] which was based primarily on actual bot edits, we analyzed the requests for ACM Reference Format: permission (RfP) process and the accompanying information in Mariam Farda-Sarbas, Hong Zhu, Marisa Frizzi Nest, and Claudia Müller- Wikidata. In the RfP process, a bot operator applies for task approval Birn. 2019. Approving Automation: Analyzing Requests for Permissions of for each task a bot is intended to carry out. The community then Bots in Wikidata. In The 15th International Symposium on Open Collaboration decides on each task. (OpenSym ’19), August 20–22, 2019, Skövde, Sweden. ACM, New York, NY, We collected all task descriptions from October 31, 2012, to July USA, 10 pages. https://doi.org/10.1145/3306446.3340833 01, 2018, and classified and analyzed them manually. Our results suggest that the community uses bots mainly to add data to Wiki- Permission to make digital or hard copies of all or part of this work for personal or data. These data originate primarily from Wikipedia. Furthermore, classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation the main reasons for unsuccessful RfPs are either the operators on the first page. Copyrights for components of this work owned by others than the themselves, because the specification of the task is insufficient, or author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or the community, who could not agree on the design of the technical republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. implementation. The contributions of our research are as follows: OpenSym ’19, August 20–22, 2019, Skövde, Sweden © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. 1 https://www.wikidata.org/wiki/Wikidata:Statistics. ACM ISBN 978-1-4503-6319-8/19/08...$15.00 2Community-created genome data portal available at www.wikigenomes.org. https://doi.org/10.1145/3306446.3340833 3An online map with an open license, available at https://www.openstreetmap.org. OpenSym ’19, August 20–22, 2019, Skövde, Sweden Farda-Sarbas et al. • We investigated bots from a community perspective, based that during this downtime, the quality control network performed on RfPs. slower but was still effective. • We provided a classification scheme for the categorization Another piece of research focuses on how "such tools transform of RfPs. the nature of editing and user interaction". Geiger shows "how a • We used our defined scheme to identify bot-requested tasks weak but pre-existing social norm was controversially reified into and data sources. a technological actor" [3]. He refers to the example of the Hager- • We analyzed how the community decides on RfPs and clas- manBot, which has been implemented to sign unsigned discussion sified the reasons for unsuccessful RfPs. entries in Wikipedia. Halfaker et al. [9] show that bots are not only • We developed a dataset of bots RfPs. responsible for the enforcement of existing guidelines on a larger In the following sections, we, firstly, present existing research scale, but also that their activities can have unexpected effects. The on bots in peer-production by focusing on Wikimedia projects number of reverts of newcomers’ edits, for example, has elevated, and how our work is built on them. We then explain our dataset while (surprisingly) the quality of those edits has stayed almost con- and classification approach. Next come the findings of the study, stant. Editors increasingly apply algorithmic tools for monitoring followed by a discussion of the results. We conclude this article by edits of newcomers. In 2010, 40 percent of rejections of newcomers’ highlighting possible directions for future research. contributions were based on these algorithmic tools [9]. This con- tradicts attempts of the community to engage more new editors. 2 BOTS IN PEER PRODUCTION Moreover, Geiger and Halfaker defined bots as "assemblages of code and a human developer" and show that bot’s activity is well aligned Very different bots populate peer production communities: These with Wikipedia’s policy environment [6]. bots collect information, execute functions independently, create The research suggests that bots are more critical to the suc- content or mimic humans. The effects of bots on peer production cess of the Wikipedia project than expected previously, despite the systems and our society are increasingly being discussed, for ex- reluctance of the Wikipedia community to allow bots at the begin- ample, when influencing voting behaviour [1] or imitating human ning [14]. Bots have a significant role in maintaining this text-based behaviour [13]. knowledge base, especially in fighting vandalism. As bots in Wiki- 4 Bots have been used in Wikipedia from early on . Wikidata’s data have their roots in Wikipedia, we expect to see similarities community has profited from these experiences when handling between bots in both peer production systems - Wikipedia and their bots. We review, therefore, existing insights into the bot com- Wikidata. Before we look closer to see if the same areas of use of munity on Wikipedia and building on that, highlight research on bot activities emerge from Wikidata, we give an overview of the Wikidata that considers bot activity. existing insights into the bot community in Wikidata. 2.1 Bots in Wikipedia Bots are software programs that automate tasks, usually repeti- 2.2 Bots in Wikidata tive or routine tasks which humans consider time-consuming and Wikidata inherited bots from its sister project Wikipedia and bots tedious (e.g. [3, 4, 16, 21]). They are operated and controlled by started editing Wikidata with its launch by linking Wikidata item humans. pages to their respective Wikipedia language pages. The current Wikipedia has developed a stable and increasingly active bot research on Wikidata bots shows that bots perform most of the community over the years, although, bots were not widely accepted edits in Wikidata [16, 20]. Steiner [20], in his research, aims to and trusted in the beginning [14].

Analyzing Requests for Permissions of Bots in Wikidata

International Benchmark of Good Practices on New Business Models and Initiatives from and for Cultural Institutions

Wikipedia and Intermediary Immunity: Supporting Sturdy Crowd Systems for Producing Reliable Information Jacob Rogers Abstract

Jose's Geweldige Presentatie

Wikimedia Conferentie Nederland 2012 Conferentieboek

An Analysis of Contributions to Wikipedia from Tor

Crystal Eastman

Wiki Loves Monuments 2011: the Experience in Spain and Reflections Regarding the Diffusion of Cultural Heritage

Community Or Social Movement? Piotr Konieczny

Detecting Undisclosed Paid Editing in Wikipedia

The Case of Wikipedia Jansn

Critical Point of View: a Wikipedia Reader

ORES Scalable, Community-Driven, Auditable, Machine Learning-As-A-Service in This Presentation, We Lay out Ideas