SOCIAL COMPUTING

Bots and Cyborgs: ’s Immune System

Aaron Halfaker and John Riedl University of Minnesota

Bots and cyborgs are more than tools to better manage content quality on Wikipedia—through their interaction with humans, they’re fundamentally changing its culture.

h e n W i k i p e d i a simply couldn’t keep up with the damaging edits that humans alone was young and the volume of incoming edits. couldn’t handle. number of active Community members responded W contributors num- to this problem by developing two Early tools bered in the 10s or 100s, volunteer basic types of computational tools: The first tools to redefine the editors could directly manage its robots, or bots, and cyborgs. Bots way Wikipedia dealt with van- content and processes. Some editors automatically handle repetitive dalism were AntiVandalBot and who have been around the longest will tasks—for example, SpellCheckerBot VandalProof. fondly recall the halcyon days when fixes spelling errors. The sidebar “A AntiVandalBot used a simple set the encyclopedia evolved slowly, and Taxonomy of ” provides of rules and heuristics to monitor one person could manually track an overview of the many activities changes made to articles, identify the all of a day’s changes in a matter of bots perform on the site. Cyborgs are most obvious cases of vandalism, and minutes. intelligent user interfaces that humans automatically revert them. Although Those days ended in 2004 when “put on” like a virtual Ironman suit this bot made it possible, for the first Wikipedia began to experience to combine computational power time, for the exponential growth in new with human reasoning. Huggle, for to protect the encyclopedia from contributors, new articles, and instance, helps users zap vandals’ damage without wasting the time popular media attention. By the edits by the thousands. and energy of good-faith editors, it time growth peaked in 2007, the Together, bots and cyborgs function wasn’t very intelligent and could only encyclopedia was receiving more than as first-line defenses in Wikipedia’s correct the most egregious instances 180 edits per minute. emerging “immune system.” of vandalism. On one hand, Wikipedia was VandalProof, an early cyborg enjoying a rich flow of its lifeblood: COMBATING VANDALISM technology, was a graphical user volunteer contributions. On the other Bots and cyborgs arguably have interface written in Visual Basic that hand, suddenly no human could been most effective at filtering let trusted editors monitor article review all of the changes. Experienced . Along edits as fast as they happened in editors spent all of their time watching with the 2004-2007 exponential Wikipedia and revert unwanted for copyright violations, potentially growth in good contributions contributions in one click. libelous articles, and vandalism; they came a torrent of bad content and VandalProof served as a natural

0018-9162/12/$31.00 © 2012 IEEE Published by the IEEE Computer Society MARCH 2012 79 SOCIAL COMPUTING

A TAXONOMY OF WIKIPEDIA BOTS shows, these tools have had an increasing role in maintaining article quality on Wikipedia. obots perform a wide range of activities on Wikipedia. These include injecting public R domain data, monitoring and curating content, augmenting the MediaWiki software, ClueBot_NG has replaced and protecting the encyclopedia from malicious activity. AntiVandalBot’s simple rules and The first bots injected data into Wikipedia content from public databases. Rambot, widely heuristics with a highly accurate accepted to be the encyclopedia’s first sanctioned robot, inserted census data into articles neural network/machine learning about countries and cities. Rambot and its cousins act as “force multipliers” by performing approach: editors submit examples repetitive activities hundreds or thousands of times in minutes. of mistakes that ClueBot_NG makes, Many other bots monitor and curate Wikipedia content. For example, SpellCheckerBot and the cyborg’s developers retrain checks recent changes for common spelling mistakes using an international dictionary to its classifier periodically. Similarly, prevent accidental “fixes” to correctly spelled foreign words. Similarly, Helpful Pixie Bot cor- Huggle has replaced VandalProof with rects ISBNs and other structural features of articles such as section capitalization. The largest a slick user interface, configurablity, class of content curators is interlanguage bots, which use graph models of links between dif- and an intelligent system for sorting ferent languages of Wikipedia to identify missing links between articles covering the same edits by vandalistic likelihood to topic in a different language. As of this writing, the has more than 60 active maximize the efficiency of human interlanguage bots. effort in dealing with those instances Some bots extend Wikipedia functionality by implementing features that the MediaWiki of vandalism that ClueBot_NG doesn’t software doesn’t support. For example, AIV Helperbot turns a simple page into a dynamic catch. priority-based discussion queue to support administrators in their work of identifying and blocking vandals. Similarly, SineBot ensures that every posted comment is signed and dated. Distributed cognition Finally, a series of bots protects the encyclopedia from malicious activity. For example, Today, the combined efforts of ClueBot_NG uses state-of-the-art machine learning techniques to review all contributions to ClueBot_NG and a small group of articles and to revert vandalism, while XLinkBot reverts contributions that create links to Huggle-human cyborgs identify and blacklisted domains as a way of quickly and permanently dealing with spammers. immediately destroy the majority of vandalism before any human editor or reader ever sees it. However, the supplement to AntiVandalBot and Current tools speed and efficiency with which its successors by leaving the obvious Several years and many iterations these tools deal with individual vandalism to bots’ simple decision- later, bots and cyborgs have become instances of vandalism is only one making algorithms and letting more powerful, accurate, and user- aspect of Wikipedia’s antivandal human editors handle the rest. friendly. Consequently, as Figure 1 strategy. R. Stuart Geiger and David Ribes observed that bots and cyborgs form 1.0 a “distributed cognition system” that has reshaped Wikipedia’s process for 0.8 identifying and removing vandals (“The Work of Sustaining Order in Wikipedia: The Banning of a Vandal,”

tion 0.6 Proc. 2010 ACM Conf. Computer

opor Bots Supported Cooperative Work [CSCW Cyborgs ed pr 10], ACM, 2010, pp. 117-126). 0.4 Humans

Stack Although bots and cyborg- editors work independently, they “operationalize each offending edit 0.2 into a social structure through which administrators and editors come to 0 know users as vandals.” In other words, Wikipedia’s bots and cyborgs 2002 2004 2006 2008 2010 Year automatically build a record of new editors’ activities to quickly and Figure 1. Stacked proportion of reverts (rejections of damaging edits) for bots, cyborgs, efficiently identify vandals and block and human editors on Wikipedia. Bots and cyborgs have been taking over an increasing them, thereby conferring immunity role in protecting articles from damage since early 2006. to future damage from that source.

80 COMPUTER BOT SOCIALIZATION While bots are highly productive, performing edits hundreds of times faster than humans, they can also be massively disruptive to the community if they perform inappropriate actions, either because of disagreements between bot authors and other Wikipedians or because of bugs. To control for such issues, the Bot Approvals Group, a band of volunteer members, vets new bot proposals and addresses bot-related grievances. Geiger argues that Wikipedia Figure 2. The Huggle interface makes it easy to review a series of recent revisions by editors view bots not only as tools or filtering them according to the user’s preferences. “force multipliers” but also as social agents (“The Lives of Bots,” Critical Point of View: A Wikipedia Reader, In the discussions about whether administrators and a few thousand G. Lovink and N. Tkacz, eds., Institute HagermanBot should be allowed to other Wikipedia users. of Network Cultures, 2011, pp. 78-93). continue to operate, prolific editor Huggle makes it easy to review This is at least partially due and administrator Rich Farmbrough a series of recent revisions by to the fact that bots interact with argued it was important for users filtering them according to the Wikipedia in largely the same to understand that “bots are better user’s preferences. Figure 2 shows manner as human editors: they edit behaved than people,” and cautioned the main interface. On the left side articles and other pages via user against “botophobia” on Wikipedia. is a list of recent revisions sorted accounts, and they send and receive Bots were to some extent becoming by vandalistic probability. When a messages, usually read by their social agents, and they would need user selects one of these revisions, human maintainer—at least for now. to find a way to peacefully coexist it vanishes from the list for all Tawker, AntiVandalBot’s operator, beside flesh-and-blood contributors. other Huggle users to reduce the even half-jokingly nominated his bot Luckily for HagermanBot and risk of conflict. On the right side for election to the 2006 Arbitration its operator, an opt-out mechanism of the screen, changes made in the Committee (Wikipedia’s version of settled the issue. Editors like selected revision are highlighted, the Supreme Court). Sensemaker could easily tell the with green indicating added content As an example, Geiger describes bot not to sign any of their edits by and yellow deleted content. The far- the intense reaction within broadcasting their preferences on right number immediately above the Wikipedia community to their profile, which the bot would the revision (+6128 in this example) HagermanBot, which enforced the check before signing a comment. is the number of words added (or commonly accepted guideline of In time, this opt-out mechanism deleted, if the number is negative) signing comments to ensure that all became an operationalization of Isaac in this revision. participants in a conversation were Asimov’s second law of robotics: a The upper right of the interface named. Although the bot sometimes robot must obey the orders given to displays the rate at which the user mistakenly signed contributions not it by human beings. is working—in this case, the user is intended to be comments, its normal reviewing 92 edits per minute and approved activities were what caused MOTIVATIONAL ASPECTS making 4 reverts per minute. In the offense. In the words of one editor, OF CYBORGS absence of information about the Sensemaker, “I don’t really like Huggle, one of the most popular quality of users’ editorial efforts, this this bot editing messages on other antivandalism editing tools on “score” encourages frequent reverts. people’s talk pages without either Wikipedia, is written in C#.NET So does the ease of reversion: when their consent or even knowledge.” and any user can download and the user clicks the large red button Although human editors had been install it. Huggle lets editors roll back with an overlapping exclamation performing HagermanBot’s function changes with a single mouse click, point in the upper left of the interface, long before it was born, many wanted but because the tool is so powerful, Huggle reverts the edit being to hold the bot to a higher standard. rollback permission is restricted to displayed and sends a warning to the

MARCH 2012 81 SOCIAL COMPUTING

suspected vandal for inappropriate manage content quality—through the Wikipedia community. Over the behavior. their interaction with humans, next decade, it will be fascinating Some Wikipedians feel that such they’re fundamentally changing to observe what norms emerge motivational measures have gone Wikipedia’s culture. for interaction between millions too far in making Wikipedia like a For instance, recent research of human users and thousands of game rather than a serious project. demonstrates that reverts are a nonhuman social agents in a domain One humorous entry even argues that powerful demotivator to Wikipedia in which both have essential roles. Wikipedia has become a MMORPG— contributors, especially newcomers. a massively multiplayer online role- Tools like Huggle that automatically is a PhD student in playing game—with “monsters” identify potential revert-worthy edits, the Department of Computer Science (vandals) to slay, “experience” make reverts as easy as a key click, and Engineering at the University of (edit or revert count) to earn, and and reward editors with a higher Minnesota. Contact him at halfaker@ cs.umn.edu. “overlords” (administrators) to submit “score” for performing reverts have to (http://en.wikipedia.org/wiki/ led to the rapid and ruthless excision John Riedl, Social Computing column Wikipedia:MMORPG). or modification of content, with only editor, is a professor in the Depart- an automated explanation to the ment of Computer Science and ots and cyborgs have become original editor. Engineering at the University of Min- essential to the Wikipedia For this reason, University of nesota. Contact him at [email protected]. Becosystem. Without them, Minnesota researchers are working edu. the exponential influx of new users with the to from 2001 to 2007 would never have develop more “sociable” versions of Selected CS articles and columns been manageable. However, bots and cyborgs that will help human users are available for free at http:// cyborgs are more than tools to better become more effective members of ComputingNow.computer.org. NETWORK IEEE Computer Society Network Webinar Series Stay current and learn from leading industry experts as they explore important technology developments and trends, Interact with presenters through a live Q&A following each live webinar.

The Tyranny of Benchmarks: Past, Present and Future Continuous Integration for Agile Embedded Software Challenges in High Performance Computing Development This webinar will brie y examine the record of HPC from In this webinar you will learn how CI can be employed in the con- vector machines to massively parallel processor systems. text of embedded software development, how the e ciencies CI We will explore current and future challenges for HPC provides can improve a business’s bottom line. software in achieving the proper machine balance Presenters: Martin Bakal and Jennifer Althouse, sponsored by IBM required for scalable application performance. Date: Thursday, March 21, 2012 Presenter: Scott Hemmert, Advance Supercomputer Lead, 2:00 PM ET / 11:00 AM PT / 18:00 GMT Sandia National Laboratories (ACES) design team. (Duration: 1 hour) Date: Thursday, March 15, 2012 computer.org/webinars/ibm/03152012 2:00 PM ET / 11:00 AM PT / 18:00 GMT (Duration: 1 hour) Rack and Stack: Solutions for Requirements computer.org/webinars/hpc/03152012 Management. This webinar will discuss IBM Rational’s Capability Portfolio and Performance Management solution and how it allows federal agencies and related organizations to manage capabilities from inception (proposal ideation) through development and support. Available now! computer.org/webinars/12012011

82 COMPUTER