Introducing New Features to Wikipedia

Introducing new features to Wikipedia – Case studies for Web Science – Mathias Schindler Denny Vrandeciˇ c´ Wikimedia Deutschland e.V. Institute AIFB Deutsche Nationalbibliothek Universität Karlsruhe(TH) Germany Germany [email protected] [email protected] ABSTRACT Wikipedia is a free web-based encyclopedia. It is written in col- laboration by hundred of thousands of contributors [2]. It runs on the Open Source MediaWiki wiki engine. Introducing new features to Wikipedia is not just a technical question, but a complex socio- technical process. Previous introductions of new features (the category system in 2004, parser functions in 2006, and flagged revisions in 2008) are described and analyzed. Based on these experiences, the design of a new feature (creating semantic annotations) is given and discussed. The interaction between the technical features and the community is shown to be an instantiation of the Web Science research cycle, thus testing the cycle as a methodological tool for web science. 1. INTRODUCTION Wikipedia is a free web-based encyclopedia. It is written in col- laboration by hundred of thousands of contributors [2]. Today, Wikipedia is available in more than 250 languages offering more than 10 million articles. For some of these languages, Wikipedia is not only a free encyclopedia, but also the only encyclopedia available in this language. Currently it is among the ten most visited Figure 1: The web science process as presented by Tim web sites in the world, sporting more than 50 thousand hits per Berners-Lee in his Keynote The two Magics of Web Science second in the peak time. at the WWW2007, Banff, Canada. The two magics are creativity and complexity, since both are not well under- The introduction of new technical features to Wikipedia has al- stood yet. Source: http://www.w3.org/2007/Talks/ ways turned out to be a complex socio-technical process. Since 0509-www-keynote-tbl/#(11) both the technical implementation and the content development of Wikipedia are done in free and open projects that both offer their 1 complete change logs, the co-development of the technical and aspects. If done well, the socio-technical implementation of the social aspects can be traced and analyzed. idea may resolve the original issue on a micro-level. But due to the size and the complexity of the web, the solution will have novel and Section 2 offers a short overview of the web science process, before unexpected implications on the macro-level. These in turn have to using it to describe the introduction of new features to Wikipedia be analyzed in order to identify new issues (i.e. situations that do in four case studies: the category system, introduced in 2004 (Sec- not conform to certain values), thus starting the web science pro- tion 3.1); parser functions, introduced in 2006 (Section 3.2); flagged cess anew. A detailed description of the steps of the process is given revisions, introduced in 2008 (Section 3.3); and semantic annota- in [4]. Figure 1 gives an graphical overview of the model [3]. tion, currently being proposed (Section 4). We close the paper in Section 5 with conclusions drawn from the case studies. The authors already identify a number of examples for the web science process. In this paper we describe further examples from 2. WEB SCIENCE PROCESS the history of Wikipedia, that can be fully traced and analyzed due The web science process [4] is a model to describe the evolution to the availability of the data about the Wikipedia project and its of the web and of systems on the web. In order to resolve issues, history. engineers creatively come up with new ideas. These are then implemented. Since the web is an inherently social infrastructure, the technical implementation of an idea will necessarily involve social 3. PREVIOUS EXPERIENCES 1MediaWiki’s SVN is accessible at http://svn.wikmedia. This section describes three case studies of previously introduced org, the database dumps of Wikipedia’s complete log history can features to the Wikipedia system. Section 3.4 notes some other be found at http://download.wikipedia.org examples that may grant further studies. 3.1 Category system Greece is {{Country | Continent=Europe (a) Following the initial growth of Wikipedia, features to improve ex- | Capital=Athens}} ploring and navigating the content became necessary. Early ver- a country in {{{Continent}}}. It’s sions of the MediaWiki software offered basically only the man- capital is {{{Capital}}}. ually created links, a backlinks feature, and a full text search in (b) order to discover the content of the encyclopedia. In many other [[Category:Country]] wikis the backlinks feature (that displays all pages that link to a specific page) was used to implement a rudimentary tagging sys- Greece is a country in Europe. It’s capital is Athens. tem: on creating a link to a page like "Greece-related topic" on all (c) pages that discuss topics related to Greece, the list of backlinks on the page "Greece-related topic" will be a list of all pages on that [[Category:Country]] topic. This approach has a number of disadvantages: the links to the topic page will behave like any other link (i.e. display in the Figure 2: Usage of templates. (a) Source of a page about text), the linked page itself will behave like a normal article (i.e. Greece calling the country template; (b) Source of the coun- it will be a normal page, and displaying the backlinks will require try template; (c) Text of the page about Greece after template- a second click on the appropriate link in the toolbar), and normal expansion. articles like "Greece" can not be used as category pages (since all normal links would also appear in the list, not just those that were introduced for categorizing: Albania would be in the list of pages 3.2 Parser functions linking to Greece, since it is would link to it, being a bordering Templates are used to include the text of another page (usually in country). Even though this is the usual procedure in many other the separate Template namespace) at the place were the template wikis, Wikipedia often avoided such wiki-specific idiosyncrasies, is being called. This allows for a higher consistency within the e.g. the omission of camel-case syntax in favor of free links (lead- wiki, since some elements will have to be written only once and ing to Wikipedia often being described as a rather atypical wiki). reused in several articles. Templates may also feature parameters, i.e. parametrized template calls will insert the parameter’s values The category system was introduced to MediaWiki in 2004 in or- in the replacing text. See Figure 2 for an example. Templates are der to solve that problem. Each article could be put into an ar- widely used in Wikipedia, e.g. the English Wikipedia offers more bitrary number of categories that are identified by freely chosen than 200,000 templates. names. Adding a page to the category "Greece" is done by adding [[Category:Greece]] to the page. Category links themselves In 2006 some Wikipedians discovered that through an intricate and are not displayed in the article text (but rather in separate visual ele- complicated interplay of templating features and CSS they could ments on the rendering of a page). The category page is a page of its create conditional wiki text, i.e. text that was displayed if a tem- own in the newly introduced category namespace, thus separating plate parameter had a specific value. This included repeated calls of it cleanly from the article space. Category pages were programmed templates within templates, which bogged down the performance in such a way to display the list of all pages categorized with this of the whole system. The developers faced the choice of either dis- category. The new category system resolved all the identified tech- allowing the spreading of an obviously desired feature by detecting nical issues of using backlinks for categorizing articles. such usage and explicitly disallowing it within the software, or offer an efficient alternative. The latter was done by Tim Starling, who The category system was activated in all language editions at the in April 2006 announced the introduction of parser functions,2 i.e. same time without consulting the Wikipedia communities of the wiki text that calls functions implemented in the underlying soft- different languages. On most languages it was quickly applied to ware. categorize a majority of the existing pages. Soon new issues were discovered, namely how to organize categories themselves. This At first, only conditional text and the computation of simple math- lead to further new features like the category tree extensions, which ematical expressions was implemented, but this already increased renders a tree of subcategories on the page of a category etc. The the possibilities for wiki editors enormously. With time further introduction and application of the categories were heatedly de- parser functions were introduced, finally leading to a framework bated in some language communities. Restrictions on the usage of that allowed the simple writing of extension function to add arbi- categories were introduced and enforced solely by the community trary functionalities, like e.g. geo-coding services or widgets. This and with manual effort. On the German Wikipedia, the commu- time the developers were clearly reacting to the demand of the com- nity decided on a moratorium on category usage, in order to first munity, being forced either to fight the solution of the issue that the define some guidelines to their application. The moratorium, and community had (i.e. conditional text), or offer an improved techni- the enforcement of the guidelines, were done fully manually by the cal implementation to replace the previous practice and achieve an community members.

Load more