<<

and tools: Review and comments

Finn Arup˚ Nielsen January 24, 2019

Abstract research is to find the answer to the question “why does it work at all?” I here give an overview of Wikipedia and re- Research using information from Wikipedia will search and tools. Well over 1,000 reports have been typically hope for the correctness and perhaps com- published in the field and there exist dedicated sci- pleteness of the information in Wikipedia. Many entific meetings for Wikipedia research. It is not aspects of Wikipedia can be used, not just the raw possible to give a complete review of all material text, but the links between components: Language published. This overview serves to describe some links, categories and information in templates pro- i key areas of research. vide structured content that can be used in a vari- ety of other applications, such as natural language processing and translation tools. Large-scale efforts 1 Introduction extract structured information from Wikipedia and link the data and connect it as . The Wikipedia has attracted researchers from a wide 2 range of diciplines— phycisist, scientists, papers with description of DBpedia. represent ex- librarians, etc.—examining the online amples on this line of research. from a several different perspectives and using it in It is odd to write a 1.0 article about a variety of contexts. a Web 2.0 phenomenon: An interested researcher Broadly, Wikipedia research falls into four cate- may already find good collaborative written arti- gories: cles about Wikipedia research on Wikipedia itself, see Table1. These articles may have more com- 1. Research that examines Wikipedia plete and updated lists of published scientific work on Wikipedia, and much research-like reporting on 2. Research that uses information from Wikipedia of relatively good quality occurs outside Wikipedia ordinary academic channels, — on web- and 3. Research that explores technical extensions to , and several non-academic organization have Wikipedia produced reports from large surveys, e.g., Pew Re- 4. Research that is using Wikipedia as a resource for communication Research that examines Wikipedia look on how the encyclopedia evolves, how the users interact with each other, how much they contribute and most of this kind of research is not interested in the content per se. Vandalism in Wikipedia presents usually not a problem: It will just be one more aspect to investigate. Work in this area may bene- fit Wikipedia as it could help answering what rules the site should operate under, e.g., how beneficial is open editing with no user registration. One repre- sentative publication in this category is Jakob Voss 2005 article Measuring Wikipedia.1 One humor- ous quote is “Wikipedia cannot work in theory, but does in practice”: The major topic in this line of Figure 1: Number of scientific articles returned from the PubMed bibliographic with a iThe present version is a working paper and continues to be expanded and revised. Revision : 1.622 query on ‘Wikipedia’.

1 Wikipedia article Description

m:Research:Index Primary entry point for Wikimedia research en:Wikipedia Main article about the encyclopedia en: article about an aspect of Wikipedia en:Criticism of Wikipedia en:Academic studies about Wikipedia en:User:Moudy83/conference papers Long list of Wikipedia conference papers en:User:NoSeptember/The NoSeptember Admin Project en:Wikipedia:Academic studies of Wikipedia Comprehensive list of studies on Wikipedia en:Wikipedia:Ethically researching Wikipedia en:Wikipedia:Modelling Wikipedia’s growth Specific results on the growth of Wikipedia en:Wikipedia:Notability (academics) Notability guideline for academics en:Wikipedia:Researching Wikipedia Discusses quantitatively measures and links to various statistics en:Wikipedia:Survey (disambiguation) en:Wikipedia:Wikipedia as an academic source List of papers en:Wikipedia:Wikipedia in research Essay en:Wikipedia:WikiProject Wikidemia en:Wikipedia:WikiProject Countering en:Wikipedia:WikiProject Vandalism studies Studies of damaging edits m:Research List resources for wiki research and researchers m:Wiki Research Bibliography Bibliography of scholar and science articles m: Research Goals m:Research:data Overview of Wikipedia-related data en..org/wiki/Portal:Wikimedia Studies s:Wikimedia-pedia Overview of research questions

Table 1: Wikimedia articles related to Wikipedia research. Some of these articles are in the main , while others require the Wikipedia: namespace prefix, while others (m: prefixed) are on the meta wiki (meta.wikimedia.org). search Center’s & American Life Project3,4 entire on the use of Wikipedia-derived data, e.g., or Wikimedia Foundation and its chapters.5 for natural language processing and ontology build- 9 Wikipedia research continues to grow and now ing. On the other hand Nicolas Jullien’s 86-page 10 there are many thousands of research articles. long working paper focused on, e.g., editor mo- In July 2010 Scholar claimed to return tivation, process, contributor roles and quality. A 11 196,000 articles when Queried about Wikipedia, 2012 version of the present paper together with while Pubmed returns 55. In May 2012 I found 114 a number of other initial papers by and 7,8, 12, 13 articles in PubMed, see also Figure1 on page1. Finnish research were evolved into a 138- page working paper.14 This lengthy paper was later A few researchers have examined the develop- split into several peer-reviewed journal papers.15, 16 ment of research literature on Wikipedia through . Han-Teng Liao reporting on his in 2012 found the number of theses from major Chinese- speaking regions to peak in 2009.6 Why is Wikipedia research of inter- est? Several researchers have already reviewed the Wikipedia and/or wiki research literature with Why does Wikipedia attract researchers? The varying degree of depth and breadth: A short 2009 popularity of the phenomenon probably attracts report found 1’000 articles,7 and identifying 400 many, and most Wikipedia articles makes sense peer-reviewed articles a short 2009 paper summa- for the ‘common researcher’ in contrast to, say, rized some of these articles.8 Another 2009 focused bioinformatic that typically require ex-

2 WMF data sets Description Wikimedia Downloads Database backup dumps and static HTML dumps Page view statistics (‘raw’) Desktop raw page view statistics Page view statistics (‘all’) Full all page view statistics Repackaged page view statistics Page view statistics in a more compressed format. English Wikipedia by second17 Total page view statistics from the English Wikipedia by timestamp collected in March and April 2015 Wikipedia Clickstream Referer-resource pairs from the request log Mediacounts Requests of files on upload.wikimedia.org Third-party data sets Description Wikipedia XML Corpus18 Annotated in XML. DeepDive Open Datasets (WIKI) Natural language processing-annotated sentences from Wikipedia Scholarly article in Wikipedia from Wikipedia articles to journal articles identified by PubMed, PubMed Central or DOI iden- tifier. Structured citations in the English Wikipedia Citation from the English Wikipedia dump. enwiki-pageviews2007-2016 statistics for the English Wikipedia collected by Alex Druk in connection with www.wikipediatrends.com. WikiCite data dumps Dump of the bibliographic data in .

Table 2: Wikipedia data sets. pert biomedical knowledge. Compared to Free and concept can easily be obtained. Open Source Wikipedia does not require The availability of the revision history enables familiarity with software.19 dynamic studies of content and contributors. In this aspect Wikipedia is not unique as studies of The openness of the project and easy availability free and open source software based in public code of data also makes Wikipedia of interest. Typi- repositories also make similar research possible. cally, research on the Web requires crawling many The structured information in Wikipedia pro- sites, whereas each complete language version of vided through categories and MediaWiki templates Wikipedia lies available for download as a com- may also help researchers, and the sister project pressed XML file ready for use and analysis. Other Wikidata provides an even more structured re- Web 2.0 large-scale data sets, e.g., from , source. may simply not be available for researchers, un- Researchers have also noted difficulties doing re- less they get exceptional access to the data of the search with Wikipedia: As it is constantly changing private company. The MediaWiki software also a researcher may write about Wikipedia and later makes Wikipedia available available through the find it to be untrue, and the vast scale makes it many-faceted API. The Toolserver facility, that was difficult to do studies on the entire Wikipedia.20 used by many Wikipedia and some researchers, even enabled direct database queries. The Toolserver closed, but a similar service, Wiki- Tools and data sets media Tool Labs, took over its functionality. dumps.wikimedia.org provides compressed XML Multiple language editions make it possible to ex- files of complete and their sister plore areas of language translation. Although the projects. This data contain the raw wiki markup text of Wikipedia is not sentence aligned the large along with metadata. Independent researchers have number of languages covered makes Wikipedia converted the wiki markup to an XML format, so unique. With Wikidata multilingual labels for each that, e.g., links, lists and paragraphs are indicated

3 with XML tags rather than with wiki markup. The command-line program wikipedia2text They refer to this processed data as the “Wikipedia downloads a specified Wikipedia article and for- XML Corpus”,18 an one set of data from mats it for display on the command-line, while 2006 Wikipedias are available from http://www- WikipediaFS makes raw text Wikipedia articles connex.lip6.fr/˜denoyer/wikipediaXML/. available under the filesystem so a Wikipedia There are a number of other datasets, e.g., articles looks like an ‘ordinary’ file. dumps.wikimedia.org/other/articlefeedback/ Edits on Wikipedias are relayed to a IRC . has article feedback data and Researcher may grap this real-time data, e.g.: In https://frdata.wikimedia.org/ has data Wikimedia his dynamically updated webpage from 2011, Wiki- Foundation fundraising data. The MediaWiki API stream, Ed Summers, presents an aggregated con- enables specialized queries, e.g., to get , tinously updated list with links to edited articles blocks or use of templates, and to return the data across the major language versions of Wikipedia. in different formats such as XML and JSON. The Wikimedia Toolserver was a platform for Wikimedia makes usage statistics available hosting various software tools written and used by to a certain extent, though full logs are not Wikimedia editors. Programmers could apply for released in the open due to (the readers’) pri- an account. It ran several specialized Web ser- vacy. Simple page view counts are available vices, e.g., a user edit counter, and CatScan which from dumps.wikimedia.org/other/pagecounts- enabled searching categories recursively. random- raw/. These statistics was originally collected bio of mzmcbride could sample randomly among by database engineer and former Wikimedia bibliographies of living persons. See Table3 on trustee Domas Mituzas (and distributed from page5 for other Web services. The Toolserver dammit.lt/wikistats/). Further processing of this has now been replaced by Wikimedia Tool Labs data is presented at http://stats.grok.se. JSON for- and many of the Toolserver tools have been mi- matted data are available from stats.grok.se//. grated there, e.g., catscan is now running from In September 2014 Andrew West discovered that https://tools.wmflabs.org/catscan2/catscan2.php. these services underreported, — they did not count mobile views. Given the rise in mobile Research on ‘human subjects’ traffic the August 2014 the number of page views for the English Wikipedia view was underesti- Several researchers have mentioned the problem mated considerable, — with about a third.21 of gaining access to users for surveys or inter- New complete statistics are made available from views. The randomization tool on Wikipedia al- https://dumps.wikimedia.org/other/pagecounts- lows an unbiased sampling among registered edi- all-sites/. This includes, apart from page views tors. On the English Wikipedia the randomization statistics to the default sites and to the mobile tool has the address http://en.wikipedia.org/wiki/- sites, also the views. Special:Random/User. After sampling a set of ed- With enabling a click tracking extension Media- itors researchers can contact the them by adding Wiki administrators can track users’ navigation notes on the user talk pages or by emailing the user. around the wiki. During Wikipedia Usability Ini- However, editors may simply ignore the request, re- tiative Wikimedia enabled the extension for beta gard it as ‘survey spamming’, or simply have left testing. the project. Emailing may also be hindered as the There are several derived data sets. The emailing facility in MediaWiki as users opt-in on Wikilinks data set provides links from entities on the email contact ability. webpages to English Wikipedia articles. It com- Interview- or survey-based research may require prises around 40 million mentions and 3 million en- the approval of the institutional review board as tities.22, 23 well as a recruitment form signed by the subject perlwikipedia, as the name implies, works with participating in the survey. Another issue raised and recent change patrolling program has been in a blog post by Heather Ford in 2013 is whether using it. Pywikipediabot is a Python-based collec- to “onymize”, pseudonomize or anonymize, the re- tion of tools for bot programming on Wikipedia search subjects.24 In Wikipedia research, as well as and other MediaWikis. It can, e.g., create inter- in Internet research generally, anonymization of a language links. WikiXRay is another software tool quote is difficult as the present day advanced Inter- written in Python and R which may download and net usually has no problem tracking process data from the wikimedia site for generating down text fragments. With the attribution inherent graphics and data files with quantitative results. in the GPL and licences users Markus Kr¨otzsch’s Java Wikidata Toolkit may even insist on being attributed, — to quote can download and parse Wikidata dump files. Ford’s experience:24

4 Name Developer Description

Article revision statistics soxred93 Detailed overview of edits of a page, interactive graphs for edits over time, article size, and top page editors Wikipedia page history aka Detailed overview of edits of a page with graphs for edits statistics over time, and page editors X!’s Edit Counter Several for user with graphs over time Edit Summary calculator soxred93 Statistics about the edit summary with graph Contributors Daniel Kinzler Ranked list of contributors for a page Recent change statistics aka Summarize recent edits (on ) Revvis Finn Arup˚ Nielsen Sequential network visualization Watcher mzmcbride Displays the number of users watching a page Quarry YuviPanda Web-based SQL queries to Wikimedia databases

Table 3: The no longer functioning Toolserver and related Web services. See a large list at http://www.mediawiki.org/wiki/Toolserver/List of Tools.

I had thought that this was the right thing chine learning, computational linguistics on the one to do: to anonymize the data, thus pro- side and wiki and other collaborative-built knowl- tecting the subjects. But the ‘subject’ was edge bases on the other side. Workshop on Collabo- angry that he had been quoted ‘without ratively Constructed Semantic Resources focuses on attribution’. And he was right. If I was social semantics and social natural language pro- really interested in protecting the privacy cessing with several contribution centered on the of my subjects, why would I quote his sen- Wikipedia corpus. The 2009 workshop The Peo- tence when anyone could probably Google ple’s Web Meets NLP: Collaboratively Constructed it and find out who wrote it. Semantic Resources also featured Wikipedia re- search content. The community meeting focuses on Wikipedia and its sister Wikimedia foundation op- A number of longer works describe Wikipedia and erated projects. Apart from community-related related topics from different aspects. New me- topics the meeting usually has a good deal of dia journalist ’s The Wikipedia research-oriented material presented. The organiz- Revolution25 recounts the ers publish no formel proceedings, but the Wiki- and its controversies. How Wikipedia works26 by mania site has usually some description of the con- Wikipedians focuses on editing, policy and the com- tributions, and make videos munity, while MediaWiki 27 and Working with Me- from recent meetings available. diaWiki 28 describes the MediaWiki software. The first book, The Wiki Way, presents the most cen- Other communication channels tral ideas of the wiki.29 Critical Point of View: A Wikipedia Reader is a edited book discussing a WikiSym mailing list wiki-research has little activ- number of issues.30 There are several other books ity, while Wikimedia mailing lists wiki-research-l published.31–33 and wiki-tech-l are fairly active. Wikimedia Foundation-employed data an- Scientific meetings alyst Erik Zachte makes charts, tables and comma-separated values files available from Several dedicated scientific meeting centers around stats.wikimedia.org of large-scale analyses of all and Wikipedia. The ACM affiliated meet- Wikimedia projects. His blog is available from ing WikiSym presents results in all areas of wiki www.infodisiac.com. The web service ‘Vital Signs’ research. Since 2006 the SemWiki workshop has running from Wikimedia Labs displays interactive presented results in research. The charts of total pageviews and other metrics over research community around that meeting also inter- all Wikimedia projects and all languages. acts at the semanticweb.org site, — itself a seman- The newsletter (Wikipedia Sign- tic wiki. The WikiAI workshop concentrates on post) discusses different matters of Wikime- the interface between artificial intelligence with ma- dia projects. More critical is the forum

5 Wikipedia Review and individuals have com- and have fun with it, but we want to have an ency- posed sites with critical commentaries, see, e.g., clopedia which is bigger and better than any ency- www.wikipedia-watch.org. Wikipedia Weekly is clopedia that has been there before [. . . ].”35 a with episodes from 2006 to presently Whereas a definition of Wikipedia as an online 2009. Since July 2011 the meta-wiki of encyclopedia would be from a reader’s point of Wikimedia has published the monthly Wikime- view, regarding Wikipedia as a collaborative writ- dia Research Newsletter (WRN) available from ing application (CWA) would also regard it from an meta.wikimedia.org/wiki/Research:Newsletter. It editor’s point of view. In a review of health CWAs focuses on new Wikimedia-related research. Wiki- researchers would map CWA depending on use pat- media Foundation-employed Dario Taraborelli of- terns: virtual communities (patients, e.g., Dealing ten writes entries, but individual researchers make with Autism), professional communities (e.g., Radi- substantial contributions. The newsletter is also ologyWiki) and Science 2.0 (e.g., OpenWetWare). aggregated into a single volume for each year. The The researchers would place Wikipedia at the very 2012 edition was 95 pages long.34 The and center of the map, see Figure2. 36 Identi.ca user WikiResearch posts links to new re- Researchers have presented other definitions: search on Wikipedia. Benkler and Nissenbaum refered to Wikipedia as a “project” and mentioned it as an example of commons-based peer production.37 Konieczny 2 Examining Wikipedia would call Wikipedia an “online organization[. . . ]”, an “online communit[y]” and “related to sev- Wikipedia has been examined in a number of ways, eral social movements, including the Free and both quantitative and qualitative, and with an Open Source Software Movement, Open Publishing analysis of the full data set and just a small part. Movement, and Free Culture Movement”.38 Such definitions focus not so much on the end product, but rather the process, the project and the group What is Wikipedia? of people that led to the result. Yet other researchers regard Wikipedia as a work The most basic question in the study of Wikipedia beyond the encyclopedia, e.g., as a semi-structured asks “what is Wikipedia”? Wikipedia itself reports knowledge source which can be used as a basis to in its first sentence in October 2013 “Wikipedia derive explicit facts,39 or a “continuously edited is a collaboratively edited, multilingual, free In- globally contributed working draft of history”.40 ternet encyclopedia supported by the non-profit Wikipedia might also be regarded as part of the Wikimedia Foundation.”ii In the initial ver- Web 2.0 or . sion of the Wikipedia article from November The answer to the question “what is Wikipedia” 2001 defined Wikipedia as “the directs the researcher in his/her study. If the name of an open content, WikiWiki encyclopedia researcher thinks Wikipedia is an encyclopedia found at http://www.wikipedia.com/, as well as then the researcher likely focuses on the content of its supporting, very active encyclopedia-building and most naturally compares Wikipedia against project.”iii Note that the recent quote defines other reference works, — printed or online. If Wikipedia as a work rather than as an organization, Wikipedia is a semi-structured knowledge source the difference between “free” and “open”, and be- then Wikipedia should instead be compared to tween “collaboratively edited” and the more tech- works such as WordNet and ontolo- nical term “WikiWiki”. Wikidata presently (2013) gies. claims it as an instance of a “wiki”, an “internet encyclopedia” and a “Wikimedia project” and the English description reads “free online encyclopedia Quality that anyone can edit” while the German description use the project.iv Interestingly, a study on the Several formel studies of the quality of Wikipedia German found that intervie- have been performed.v Such studies typically se- wees would see a German focus on the end product, lect a sample of Wikipedia articles and “manually” while claiming the English-speaking community fo- read and judge the quality, sometimes in compar- cused on the process citing a German Wikipedian: ison with other or other resources. “We are not here because we want to use the wiki The quality may be rated on several dimensions:

ii“Wikipedia” (oldid=576265305) vSee the overviews on the English Wikipedia iii“Wikipedia” (oldid=331655534) ‘en:Wikipedia:External ’ (oldid=169010073) ivhttps://www.wikidata.org/wiki/Q52 and ‘en:Reliability of Wikipedia’ (oldid=179243736).

6 Figure 2: Map of collaborative writing applications with Wikipedia at the center. From Archambault et al., 2013, CC-BY.36

Accuracy (no factual errors), coverage, bias, con- 162 against 123. Both encyclopedias contained 4 ciseness, readability, up-to-dateness, usable/suit- “serious errors, such as misinterpretations of of im- able and whether the articles are well-illustrated portant concepts”. The study was not itself and well-sourced. For Wikipedia’s ‘featured arti- peer-reviewed. cles’ Wikipedia has the following quality dimen- In the 2006-published study ex- sions: comprehensive, accurate, verifiable, stable, amined several quality aspects of American history well-written, uncontroversial, compliance, appro- articles in English Wikipedia compared against En- priate images, appropriate style and focus, while carta and American National Online.20 Stvilia et al. in their study of Wikipedia discus- He found the essays on the history sions worked with 10 different dimensions.41 to have inaccurate descriptions and with incom- There have been many quality studies in the plete coverage. He attributed this to “broad syn- health//drug domain: In a 2013 review thetic writing is not easily done collaboratively” of collaborative writing applications in health care and found of historical figures to “of- researchers identified 25 papers reporting on the fer a more favorable terrain for Wikipedia since bi- quality of information in such systems, with 24 of ography is always an area of popular historical in- them evaluating Wikipedia,36 and researchers con- terest”. Rosenzweig then examined bibliographies tinues to study health-related information quality of American historical figures in and the in Wikipedia. 18,000 entries American National Biography Online for comparison against Wikipedia. Of 52 examined Overall quality people listed in American National Biography On- line about half were listed in Wikipedia and one- In the perhaps most widely referenced investiga- fifth in Encarta. He found the American National tion the science journalists of Nature collected 42 Biography Online had more details with about four science articles from Wikipedia and Encyclopæ- times as many as Wikipedia. He noted a dia Britannica and let blinded experts evaluate bias in coverage between articles on Isaac Asimov, them.42 The comparison between the two ency- President Woodrow Wilson and Lyndon LaRouche clopedia showed that the articles of Wikipedia con- judging the American National Biography Online tained the most factual errors, omissions and mis- to give a more proportionate coverage. He went on leading statements, — but surpricingly not partic- to examine factual errors. In 25 Wikipedia articles ularly many more than Encyclopædia Britannica: he found clear cut errors in four, and 3 articles with

7 Topic Comparisons Articles Evaluation

Science42 Britannica 42 Blinded experts Biographies of American histori- Encarta, American National Bi- 52/25 The author cal figures20 ography Online Pop culture, current affairs, sci- — 3 broad topics 3 librarians ence43 Surgical procedures44, 45 — 30 Experts General46, 47 Brockhaus 50 Research institute Drug information48 Medscape Drug Reference 80 questions The authors Medical students informa- AccessMedicine, eMedicine, Up- 3 Blinded experts tion49, 50 ToDate Cancer information51, 52 US National Cancer Institute’s 10 Medically trained Data Query (PDQ) personnel Osteosarcoma53 NCI patient and professional site 20 questions 3 independent ob- servers Mental disorders54 13 10 topics 3 experts Medication information55 Manufacturer’s package insert 20 drugs Four drug infor- mation residency- trained pharmacists Orthognathic surgery56 24 other websites The topic Scoring against “DISCERN” Nephrology57 — 95 ICD-10 codes Counting refer- ences, readability index computation General58 — 134 96 experts Medical conditions59–63 Peer-review literature 10 Pairs of medicine residents or rotat- ing interns Prescription drugs64 — 22 The authors Drugs65 Text books 100 The authors(?) Health, medicine, nutrition66 WebMD, Mayo Clinic 92 statements Raters/authors Breast reconstruction67 9 web sites 1 Readability in- dex computation, authors rating

Table 4: Selection of Wikipedia quality studies. See also the Multimedia Appendix 3 of the Archambault 2013 review.36 factual errors among 10 examined for Encarta and study was not a comparison and not blinded. 1 article with errors in American National Biogra- phy Online. Rosenzweig found Wikipedia “more In December 2007 Stern performed an examina- anecdotal and colorful than professional history” tion of the German Wikipedia and the on-line edi- 46, 47 and focus on topics with recent public controversy, tion of the German encyclopedia Brockhaus. and concluded “Wikipedia, then, beats Encarta but This weekly magasin had asked an independent re- not American National Biography Online in cover- search institute, Wissenschaflicher Informationsdi- age and roughly matches Encara in accuracy”, and enst K¨oln,to evaluate 50 articles in relation to cri- further noted American National Biography Online teria of correctness, completeness, up-to-dateness has richer contextualization and easily outdistances and comprehensibility. In 43 cases the Wikipedia Wikipedia on “persuasive analysis and interpreta- article was evaluated as better than Brockhaus’, 69, 70 tions, and clear and engaging prose”. and Wikipedia got the best grade average. Wikipedia was regarded as better in up-to-dateness Another early quality of information study, a and—perhaps surpricingly—in correctness, while peer-reviewed one from 2006, looked on the credi- Brockhaus scored better in completeness. Further- bility of Wikipedia and its articles.68 However, this more some Wikipedia articles were regarded as too

8 complicated for the lay reader. able to Wikipedia.73 One of the few studies to In the summer of 2009 Danish newspaper compare with other language resourses Berlingske Tidende made a small informal compar- examined the German Wiktionary with GermaNet ison between the and the larger and OpenThesaurus and found that the scope of (in terms of number of articles) expert-written Den the three resourses varied depending on which vari- Store Danske online encyclopedia. Overall the Dan- able they look at, e.g., Wiktionary had the hightest ish Wikipedia came slightly ahead due to its many number of word senses but the lowest number of links, typically longer articles and more frequent synonyms.74 updates, even considering the background of the authority of Den Store Danske.71 The author also noted the quicker and more precise searching in Factual errors Wikipedia. Trusting specific facts on Wikipedia is question- Among the many health-related quality of in- 44, 45, 48–57, 59, 60, 64–67, 72 able, as there might be typos, intentional or un- formation studies is a study intentional errors, biased presentation, or , from 2007 where medical doctors reported on sur- 44 e.g., a misinformation on Wikipedia propagated to gical information in Wikipedia. Identifying 39 the orbitury for composer on The common surgical procedures the researchers could Guardian web-site.75 As a research experiment find 35 corresponding Wikipedia articles with all a student had entered made-up Jarre quotes in of them judged to be without overt errors. The Wikipedia immediately after Jarre’s death. An researchers could recommend 30 of the articles for obituary writer working under a tight deadline patients (22 without reservations), but also found picked up this information though it stayed in that 13 articles omitted risks associated with the 45 Wikipedia for only 25 hours. The was only surgical procedure. revealed after the student contacted the publish- Other researchers examined Wikipedia August ers.76 Similar vandalism that spreads to obituaries 2009 cancer information and US National Can- happened for .77 cer Institute’s Physician Data Query (PDQ).51, 52 Cautionary notes have been cast for the open They found that Wikipedia had similar accu- wiki- in cases where potentially hazardous racy and depth when compared against the procedures are described.78 Especially chemical professionally-edited PDQ, but they also found that and medical procedures and compounds may call Wikipedia had lower readability as measured with for complete and accurate description. For medical the Flesch–Kincaid readability test. drug information Kevin Clauson and his co-authors Considering scope, completeness, and accuracy compared Wikipedia and Medscape Drug Refer- of information for osteosarcoma on April 2009 En- ence (MDR), a free online “traditionally edited” glish Wikipedia compared against patient and pro- database.48 They found that Wikipedia could an- fessional sites of US National Cancer Institute swer fewer drug information questions, e.g., about (NCI) 3 independent observers scored the answers dosage, contraindications and administration. In to 20 questions on a 3 point scale. Wikipedia scored the evaluated sample Wikipedia had no factual er- lower compared to the two NCI versions, though rors but a higher rate of omissions compared to the statistical test only showed significant differ- MDR. The authors could also find a marked im- ence against the NCI professional version.53 provement in the entries of Wikipedia over a just Three psychologists with relevant expertice ex- 90 days period. The study went on to main- amined 10 topics in mental disorders across 14 web- stream media with headlines such as “Wikipedia sites with respect to accuracy, up-to-dateness, cov- often omits important drug information” and even erage, referencing and readability.54 Among the “Why Wikipedia Is Wrong When It Comes To Pre- websites, beside Wikipedia, were NIMH, WebMD scription Medicine”. As noted by Wikipedians on and Mayo Clinic and among the topics examined a discussion page the study did not mention the were “childhood onset of psychosis” and “gambling fact, that one of the Wikipedia manual of styles and depression”. Wikipedia scored high (“generally explicitly encourages Wikipedia authors not to in- rated higher”) on accuracy, up-to-dateness and ref- clude dosage information with the present wording erencing, while low on readability. of “Do not include dose and titration information While numerous studies have examined the qual- except when they are notable or necessary for the ity of Wikipedia, one finds far fewer studies of the discussion in the article.”vi Thus in one of the 8 qualities of other wikis, e.g., the Wikipedia sister examined question categories the omissions on the projects. In 2008 a lexicography study would claim that Wiktionary had a poor quality, while at the vihttp://en.wikipedia.org/wiki/Wikipedia:MEDMOS, same time noting the comparative studies favor- 252051701

9 part of Wikipedia comes as an intention by consen- fans contributions and other categories from the sus. mass-insertion of material from public data sources On a small comparison study on medical informa- such as United States Census. The two investi- tion with just 3 topics blinded experts found some gators could also find missing articles in the 2006 factual errors in Wikipedia, — around the level of Wikipedia, when compared to three specialized medical online resources UpToDate and eMedicine. encyclopedias in linguistics, poetry and physics. AccessMedicine were found to have no factual er- Halavais and Lackaff also noted some peculiarities rors among its 3 articles examined.49, 50 in Wikipedia, e.g., extensive list of arms in the mil- itary category, comics fans to some extent driving Coverage the creation of articles in the fine art category and voluminous commentary on the Harry Potter series One kind of critique often carried forth is that in the literature category. Wikipedia tends to have a emphasis on topics in For twentieth century Elvebakk pop culture, — the critique following the template compared Wikipedia against two online peer-review “there are more entries for [a pop culture phe- resources, The Stanford Encyclopaedia of Philoso- nomenon] than for Shakespeare.”vii Is there a bias phy and the Internet Encyclopedia of , in the topical coverage of Wikipedia? Are there any with respect to coverage of gender, nationality and other bias in coverage, e.g., with respect to gender discipline. She concluded that Wikipedia did not and nationality? represent “the field of philosophy in a way that is Studies on topical coverage in Wikipedia of- fundamentally different from more traditional re- ten examine the number of Wikipedia articles sources” in 2008.83 Wikipedia had far more articles within a given subject area and compare that num- about the philosophers than the two other resources ber to associated numbers in works or databases and only some minor differences in fractions, such from governments, well-established companies or as a smaller fraction of German and French philoso- other organizations, which then acts as a refer- phers. ence,79–84, 86 see Table5. In 2005 Altmann could In a study on the efficiency of Web resources for write that “[m]edical Informatics is not represented identifying medical information for clinical ques- sufficiently since a number of important topics is tions Wikipedia failed to give an answer in a little missing”. He had compared the English Wikipedia above a third of the cases, while Web search en- to 57 terms he found in “Handbook of Medical In- gines, especially Google, were much more efficient. formatics”.79 However, Wikipedia was more efficient than medi- Looking at outbound scientific citations in the cal sites such as UpToDate and eMedicine in terms English 2007 Wikipedia I found astronomy and as- of failed searches and number of links visited, and trophysics articles rather much cited compared to the ‘end site’ that most often provided the ultimate Journal Citation Reports from Thomson Scientific, answer from a was Wikipedia.91 In but generally an overall agreement.80 Journal of Bi- another 2008 medical coverage study researchers ological Chemistry got undercited but that changed found over 80% of ICD-9 and ICD-10 diagnostic after automated mass-insertion of genetic informa- codes in gastroenterology covered by Wikipedia.84 tion.81 One peculiarity with the sample occured A similar study for nephrology found 70.5% of ICD- for botany journals. A Wikipedia project 10 codes represented in August 2012.57 In another had produced a number of well-sourced articles on life science coverage study, researchers constructed Banksia some reaching featured article status. The a semi-automated program for matching LOINC citation from these Wikipedia articles would skew database parts with Wikipedia articles. Of the the statistics. 1705 parts they examined in October 2007 they By sampling 3000 articles from the English 2006 found 1299 complete matches in Wikipedia with Wikipedia and categorizing them against the Li- their semi-automated method, 15 partial matches brary of Congress categories Halavais and Lackaff and a further 15 matches from manual search, i.e., found categories such as social sciences, philosophy, 1329 corresponding to 78%.86 They concluded that medicine and underrepresented in Wikipedia “Wikipedia contains a surprisingly large amount of compared to statistics from Books in Print.82 The scientific and medical data” two latter categories had, however, on average a A 2008 study compared the number of words in comparably large article size. They identified sci- sets of Wikipedia articles with the year associated ence, music, naval and, e.g., geography as over- with the articles and found that articles associated represented, with music probably benefitting from with recent years tended to be longer, i.e., recency vii“Why are there more Wikipedia entries for Doctor Who was to a certain extent a predictor for coverage: than there are for Shakespeare?”90 The length of year articles between 1900 and 2008

10 Topic Comparisons Result

Medical informatics79 Handbook of Medical Informatics “Medical Informatics topics are not very well represented in the Wikipedia currently [2004/2005]” Scientific citations80, 81 Thomson Scientific Journal Cita- Generally good correlation with scientific tion Reports paper citations. Astronomy and banksia somewhat overcited. Dependence on bots. General topics, physics, Library of Congress categories, 82% (physics), 79% (linguistics) and 63% linguistics, poetry82 Encyclopedia of Linguistics, New (poetry) coverage Princeton Encyclopedia of Po- etry and Poetics, Encyclopedia of Physics Twentieth century philo- The Stanford Encyclopaedia of Phi- Wikipedia had 534 philosophers covered sophers83 losophy and the Internet Encyclo- while the other two had 60 and 49, respec- pedia of Philosophy tively Gastroenterology84 ICD-9 and ICD-10 codes 83% coverage Philosophers85 Facts extracted from A History of 52% coverage Western Philosophy, A History of Western Philosophy, The Oxford Companion to Philosophy and The Columbia History of Western Phi- losophy Medical terminology86 LOINC database 78% coverage General topics — 30% obtained ‘good’ or ‘excellent’ marks US gubernatorial candi- Number of real world candidates 93% candidate coverage, 11–100% election dates and elections87 and elections coverage Women88 National Women’s History Project 23/174 or 77/268 missing Nephrology57 ICD-10 codes 70.5% coverage Scientists89 Thompson Reuter list 22%–48% coverage Drugs65 text books 83.8% (German), 93.1% (English)

Table 5: Selection of Wikipedia coverage studies. and the year as a predictor variable had a Spear- section of 446 articles randomly and blindly sam- man correlation coefficient on 0.79. The results pled from Encyclopædia Britannica Wikipedia ar- were not homogeneous as the length associated with ticles lacked entries for 15, e.g., “Bushmann’s carni- articles for Time’s person of the year had a corre- val”, “Samarkand rug” and “Catherine East”. All lation of zero with the year. Academy award win- of 192 random geographical articles from Britan- ning films and “artist with #1 song” had correla- nica had corresponding articles in Wikipedia. Of tion between the two: 0.47 and 0.30, respectively. 800 core scientific topics selected from biochemistry The authors of the study also examined other sets and cell biology text books 799 could be found in of articles in Wikipedia and the correlation with Wikipedia. He concluded that science is better cov- column inches in Micropaedia of the Encyclopædia ered than general topics and that Wikipedia covers Britannica, country population and company rev- nearly all encyclopedic topics.93 enue. The correlations were 0.26, 0.55 and 0.49, Kittur, Chi and Suh developed an algorithm that respectively. In their comparison with 100 articles would assign a topic distribution over the top-level from Micropædia they found that 14 of them had categories to each Wikipedia article.94 After eval- no Wikipedia entry, e.g., “Russian Association of uating the algorithm on a human labeled data set Proletariat”, “League for the Independence of Viet- they examined the English Wikipedia and found 92 nam” and “urethane”. that ‘Culture and the arts’ and ‘People and self’ as Bill Wedemeyer presented the quality of scien- the most represented categories. Between the 2006 tific articles on the English Wikipedia on Wikima- and 2008 data set they found that ‘Natural and nia 2008 as he and his students had examined the physical sciences’ and ‘Culture and the arts’ cate- coverage based on several data sets. On a cross- gories grew the most. By combining the algorithm

11 with a method for determining degree of conflict of reporting of the results of his study performed to- each article95 they could determine that ‘Religion’ gether with Piotr Konieczny. Among variables in and ‘Philosophy’ stood out as the most contentious relation to gender they considered celebrity sta- topics. tus finding that “recorded females [in Wikipedia] A case of bias in coverage with an individual are more likely to be celebrities”.100 Also in 2015 Wikipedia article reached mainstream media. A would publish a blog post with user flagged the article on Kate Middleton’s wed- his Wikidata analysis comparing gender represen- ding grown for deletion. The flagging and the en- tation grouped by centuries, region and country suing debate about the notability of the dress was and when he compared Wikidata against VIAF seen a symptom of the ‘gender gap’ of Wikipedia. and Oxford of National Biography he argued with a “strong keep” that “I found Wikidata had a more equal representation believe that our systemic bias caused by being a of males and females.102 Later that year he used predominantly male geek community is worth some Wikidata to compare gender representation across reflection in this ” and pointed out that Wikipedia language-versions with respect to ‘Getty Wikipedia in contrast has “over 100 articles on dif- Union List of Artist’ and the ‘Rijksbureau voor ferent Linux distributions, some of them quite ob- Kunsthistorische Documentatie’ database now find- scure” and with “virtually no impact on the broader ing a clear male bias for almost all Wikipedias for culture, but we think that’s perfectly fine.”96 Par- both lists of artists.103 Manske was prompted by allel to the media focus on the gender imbalance Jane Darnell who has made several analyses of the among contributors,97 a couple of studies have ex- gender gap.viii amined on the possible biased representation of The studies find an increasing ratio of female rep- women on Wikipedia.88, 98–101 resentation over time (e.g., date of birth). Why Reagle and Rhue have reported on the female does this increase appear? Is it because women proportion in biographic resources for persons born have increased their position in society? Han-Teng after 1909: 28.7% (Gale Biographical Resource Liao put forward an alternative ‘male gaze’ hypoth- Center) and 24.5% (Wilson’s Current Biography Il- esis, where the increase comes through a gender- lustrated), but also as low as 15% (American Na- interest bias, e.g., young males interested in female tional Biography Online). Other lists of notable celebrities such as porn actresses.104 persons yield percentage on 10% ( top Tools associated with Wikidata can make cov- 100 most influential figures in American history) erage estimation across Wikipedias trivial. The and 12% (Chambers Biographical Dictionary). For Mix’n’match lists entries from several external the English Wikipedia the researchers found 16% databases and display matches with Wikidata and after a similar analysis of Encyclopædia Bri- items, e.g., it can list the members of the Eu- tannica the researchers concluded “Wikipedia and ropean Parliament based on information from its Britannica roughly follow the biases of existing homepage http://www.europarl.europa.eu/meps/ works”.88 In 2011 Gregory Kohs would report together with the Wikidata item. Statistics can a higher number on 19% for the female propor- then show that (in the case with this database) all tion, — this was for a random sample of 500 liv- members are matched to Wikidata items, but, e.g., ing people biographed on Wikipedia.98 Reagle and only 293 members, corresponding to around 8%, Rhue also compared biographic article lengths in have a Danish Wikipedia article, while the English Wikipedia and Encyclopædia Britannica with re- Wikipedia has a coverage of over 50% with 2020 ar- spect to gender and found no consistent bias in ei- ticles for parliament members (as of October 2014). ther female or male direction.88 Wikidata, where The list is only one of several. a property of an entity may indicate the gender, Examples of other catalogues are Oxford Dictio- can scale up the analysis and make it multilingual nary of National Biography, BBC Your Paintings to several hundred thousand persons, even over a and Hamburgische Biografie. million: Using Wikidata’s ‘sex’/‘gender’ property Wikipedians with a particular interest in cover- and its language links Max Klein compared the age organize themselves in the WikiProject Miss- sex ratio across Wikipedia language versions find- ing encyclopedic articles, a project where the main ing (among the big Wikipedias) the most equal rate goal “is to ensure that Wikipedia has a correspond- on the yet still well below 25% ing article for every article in every other general female, while the English Wikipedia had a female purpose encyclopedia available”. The project lists percentage on 17.55%. A reference file, the Virtual quite a number of reference works and other re- International Authority File (VIAF) data, gave a sources for benchmarking Wikipedia coverage. Ex- 24.35% female rate in Klein’s study.99 In 2015 Max Klein published a blog post with a more detailed viiihttps://commons.wikimedia.org/wiki/Category:Jane Darnell.

12 amples are Encyclopædia Britannica 1911, 2007 ings and recommendations on medical drug use. Macropædia and The Encyclopedia of Robberies, When FDA issues these communications Wikipedi- Heists and Capers. ans should incorporate them in the articles about Editor and researcher of Wikipedia, Emilio J. the drug for timely information to patients and Rodr´ıguez-Posada, has attempted to estimate the . A study on these communications from number of “notable articles needed to cover all hu- 2011 and 2012 showed that Wikipedians do not up- man knowledge”.ix As of January 2014 the number date the information satisfactorily: For 22 prescrip- stands on 96 million. It is made up of, e.g., the tion drug articles researcher found that Wikipedi- number of species, in one source estimated to be ans had not incorporated specific FDA communi- 8.7 million (eukaryotes).105 The around 14 million cations in 36% of the articles when they examined entities on Wikidata (as of January 2014) makes up the articles more than a year after the FDA com- around 15% of 96 million, yet the 96 million does munications.64 not include, e.g., the majority of the large number of chemical compounds described. Chemical Ab- stracts Service announced in 2011 that they had Sources and links reached the 60 millionth entry in their registry.106 Many studies and tools examine the outbound ref- Limitations in coverage due to Wikipedia policy erences to sources that Wikipedia uses80, 81, 93, 101 of notability has inspired other web-sites: Deletion- or the inbound links that comes from documents to pedia records deleted pages from Wikipedia in a Wikipedia. Often the count of sources are used as MediaWiki run site with no editing possible, and a feature in studies of article quality.93, 101 Wede- Obscuropedia is a wiki for non-notable topics not meyer’s study looked on the references of Wikipedia covered by Wikipedia. articles. They found that most developed articles had sufficient references comparable to a scientific Up-to-dateness review article, but some articles, even two featured, had insufficient referencing.93 Several quality comparison studies examine the up- Ed Summers’ Linkypedia Web service available to-dateness and find that Wikipedia compares well from http://linkypedia.info/ makes statistics avail- in this aspect,48, 54, 69, 70, 93 although not equivo- able online on which Wikipedia articles links to cal.64 In the comparison between Wikipedia and specific webpages on selected websites. As of 2012 Medscape the researchers found four factual errors statistics was mainly available for selected GLAMs. in Medscape among 80 articles examined. Two of It enable, e.g., British to see that their these occured due to lack of timely updates. No page on ancient Greece had the highest number of factual errors occured in Wikipedia.48 The Wede- inlinks from Wikipedia: 39 as of February 2012; meyer study found that Wikipedia was much better that Wikipedia articles in the category “BBC Ra- up to date than Encyclopædia Britannica.93 dio 4 programmes” linked much to their ; In a study on twentieth century philosophers that the “Hoxne Hoard” article had no less than 27 Wikipedia had far more articles on philosophers links to their website; and that the total number of born after the Second World War than two other Wikipedia links to the website was online encyclopedias The Stanford Encyclopedia of 2’673 from 1’209 pages. Philosophy and The Internet Encyclopedia of Phi- Another Web service of Ed Summers, wikitweets, losophy.83 displays microposts from Twitter that link to The Danish Wikipedia has a large number of bib- Wikipedia. The Web service runs in real-time from liographies copied more or less unedited from two wikitweets.herokuapp.com and tweets with excerpt old reference works with expired copyright: Dansk of the linked Wikipedia articles are stored in ma- biografisk Leksikon and Salmonsens Konversation- chine readable JSON format at the Internet sleksikon. The age of the works affects the language (archive.org/details/wikitweets). and viewpoint of the Wikipedia articles.107 Readers clicks the links in Wikipedia articles to Information on the death of a TV host, Tim outside sources to a considerable extent. In 2014 Russert, came out in Wikipedia before news organi- a CrossRef statistics reported that Wikipedia was zations published it. The author of the Wikipedia the “8th largest referrer of CrossRef DOIs”.109 Web entry came inside a news organization.108 services from CrossRef allow easy identification of Medical drugs may have associated safety alerts, which Wikipedia articles cite a scientific article e.g., United States Food and Drug Administration based on DOI information as well as a real-time issues Drug Safety Communications with warn- citation events updates.x

ixhttps://en.wikipedia.org/wiki/Wikipedia:ALL xSee, e.g., http://det.labs.crossref.org/works and

13 How Wikipedia is used as a source has also been software can be programed to ensure that, e.g., the described. In the media , e.g., Philadelphia ’alt’ field of the ’img’ HTML is automatically Inquirer instructs journalist never to use Wikipedia set. “to verify facts or to augment information in a story” and one reporter has been cited for “there Use of Wikipedia in court is no way for me to verify the information without fact-checking, in which case it isn’t really saving me The used Wikipedia for any time.” Other news organizations allow occa- the definition of the word ‘laptop’,116 and sev- tionally citation of Wikipedia as a source, e.g., Los eral American courts have used Wikipedia in Angeles Times.110 An analysis of press mention- their rulings,117 e.g., Connecticut Supreme Court ing (cited, quoted or referred) of early Wikipedia cited Wikipedia for the number of gay Congress- found, e.g., that Daily Telegraph Online accounted men.118 These cases are not singular: In Febru- for roughly a third of all citations.111 The site con- ary 2007 Washington Post noted that courts cited sistently referred to Wikipedia for further reading Wikipedia four times as often as Encyclopædia Bri- and background information in sidebars. tannica,119 and in 2008 Murley ran a database search and found “1516 articles and 223 opinions that had cited to Wikipedia articles” in the West- Genre and style law’s and ALLCASES databases.120 The English Researchers have also investigated other content Wikipedia maintains incomplete lists of mostly En- topics besides quality. In one study researchers ex- glish language court cases using Wikipedia as a amined 15 Wikipedia articles and their correspond- source: Wikipedia:Wikipedia as a court source and ing talk page and compared them with other online Wikipedia:Wikipedia in judicial opinions. A cou- knowledge resource: Everything2 and Columbia ple of other language versions of Wikipedia have Encyclopedia. They specifically looked on the for- similar lists for cases in their respective languages. mality of the language by counting words indicative Three lengthy papers examine the judiciary use of formality or informality, such as contraction, per- of Wikipedia and discuss the controversy of using 121–123 sonal pronouns and common noun-formative suf- Wikipedia as an authority. They find the fixes. With factor analysis they found that the style first references to appear in 2004, peaking in 2007 Wikipedia articles is close to that of the Columbia and a decrease towards the end of their examined 121 Encyclopedia.112 periods, — November 2007 for Breinholt and 122 was featured in a study on 2008 for Stoddard. Breinholt classifies the dif- Wikipedia biased representation wrt. gender. The ferent uses of Wikipedia into four categories: study found that, e.g.,the word ‘divorce’ appear 1. Wikipedia as a dictionary. Wikipedia used, with a higher rate in articles about women com- e.g., to answer what “candystriper” means. pared to articles about men.101 The genre may also evolve as editors extends and 2. Wikipedia as a source of evidence. In the change the articles.113 most perilous use of Wikipedia judges rely on Wikipedia for evidence with one example be- ing a judge relying on Wikipedia for whether Accessibility Interstate-20 US highway does or does not ex- Lopes and Carri¸co examined 100 Wikipedia tend from . and 265 non-Wikipedia Web articles cited by 114 3. Wikipedia as a rhetorical tool. Harmless use Wikipedia. They looked for their level of ac- of Wikipedia, e.g., for literary allusions. cessibility, i.e., to which extent the fulfilled the Web Content Accessibility Guidelines of the World 4. Judiciary commentary about Wikipedia. One Wide Web Consortium designed ‘to make Web among the few cases involved a judge caution- content accessible to people with disabilities’.115 ing against citing Wikipedia in an appellant The authors found that Wikipedia articles on av- brief. erage scored better than the Web articles they cited. They further argued that the discrepancy In some cases Wikipedia may be the only refer- between the accessibilities could lower the credibil- ence available for definitions of words, e.g., at one ity of Wikipedia. It is not so odd that Wikipedia point Google returned only Wikipedia for a ‘define’ scores well in accessebility since HTML mark is au- query on “candystriper”. Should one entirely ig- tomatically contructed from wiki-markup, and the nore Wikipedia? I my opinion Wikipedia articles can be used for definitions provided that the def- http://events.labs.crossref.org/events/types/WikipediaCitationinition. has been overlooked by many readers and

14 editors and that many reliable editors have over The Wikipedia has had a relatively small time edited the article, — so we may regard it as size compared to the number of speakers. The a consensus definition. For establishing that ‘many low attendance for a Wikipedia event in reliable editors have edited’ one would need to ex- was blamed on ‘general lack of awareness of the amine the revision history and possibly the discus- importance of the issue’ and ‘culture of volunteer sion page and its associated revision page. This work’.128 Arabic users may choose to write in En- process might require an expert Wikipedian. glish because they find it easier to communicate in Cases in Wikipedia and Wikimedia Commons that language due to keyboard compatibility prob- may give rise to legal discussions. The copyright lems and to bring their words to a wider audi- status around the so-called Monkey selfie has prob- ence.129 Users of the Internet may also be hindered ably been the the most widely discussed. The es- by low cabel capacity in some areas as has been the say ‘Final exam for wikilawyers’ by the Wikipedian case in East .130 Newyorkbrad sets up number of interesting ques- One obvious factor for the size of a Wikipedia tions from fictional and real-world cases. comes from the willingness of the community to let bots automatically create articles. Letting bots create articles on each species may generate many Size across languages hundreds of thousands articles.131 The unwilling- ness to let bots roam and the elimination of stub Why does the language editions of Wikipedia differ articles and a focus on quality compared to quan- in size? If it is often pointed out that the number tity in the German Wikipedia35 may explain the of speakers of a language is a good indicator for the why the its article-to-speaker ratio is (as of Febru- size of Wikipedia in that language,124 then why is ary 2014) quite lower thatn the Dutch and Swedish. the larger than the Danish? And why was the larger than the Arabic until 2011?xi Network analysis, matrix factoriza- Morten Rask analyzed 11 Wikipedia language tions and other operations editions with respect to creation date of Wikipedia, Modern network analysis has come up with a number of speakers of the language, Human Devel- number of new notions, e.g., small world net- opment Index, Internet users, Wikipedia contrib- works,132 the power law of scale-free networks,133 utors and edits per article and found a number of PageRank134, 135 and hubs and authority.136 For- correlations between these variables, e.g., the Inter- mula with algorithms have been put forward net penetration and level of human developement that quantatively characterize the concepts and was correlated to the number of contributors:125 they have been applied to a diverse set of net- Wikipedia contributors are rich and e-dy. Explain- works, e.g., the network of movie actors, power ing the quality of the German Wikipedia Sue Gard- grid, neural network and the . ner put forth related factors, saying: “ is Wikipedia researchers have also examined the a wealthy country. People are well educated. Peo- quantative characteristics for the networks inher- ple have good broadband access. So the conditions ent in Wikipedia. Among the many networks for editing Wikipedia are there.”126 Other vari- characteristics reported for a variety of Wikipedia ables that may affect the Wikipedia size of dif- derived networks are PageRank and Kleinberg’s ferent language edition are culture of volunteer- HITS or other eigenvalue-base measures,137, 138 ing, willingness to translate (from other language small world coefficient, with cluster-coefficient Wikipedia) and problems with non-latin charac- and average shortest path,138–141 the size of the ters. Among the for the relatively small size ‘bow tie’ or giant components,138, 139, 141, 142 power of the Korean and Chinese Wikipedias Shim and law coefficients,1, 138, 139, 141–144 h-index,141 reci- Yang suggested the competition faced by Wikipedia procity,139, 141 assortativity coefficients,138, 139, 141 from other knowledge-sharing Web services: Ko- triade significance profil.139 and acceleration.145 rean question/answering site Jisik iN and Chinese Networks can be represented in matrices, thus online encyclopedia Baike.127 As a fur- matrices can also be constructed from content and ther factor Andrew Lih and would metadata in Wikipedia articles. Mathematical op- also mention the ability to meet face-to-face due erations can be performed on the matrices to ex- to German-speakers geographically location in a amine aspects of Wikipedia or to test computa- relatively small area and the German Verein cul- tional algorithms on large-scale data. Wray Bun- ture.25, 126 tine built a matrix from the within-wiki links be- xicompare http://stats.wikimedia.org/EN/TablesWikipediaEO.htmtween 500’000 pages of the English 2005 Wikipedia with http://stats.wikimedia.org/EN/TablesWikipediaAR.htm and used a discrete version of the hubs and au-

15 thority algorithm to find topics in Wikipedia,146 a Wikipedia articles may also be used as features e.g., one topic would display the Wikipedia arti- in the construction of a matrix, so the resulting cles “Scientific classification” and “Animal” as the matrix is a document-term matrix. A decompo- top authorities and “Arterial hypertension” and sition of such a matrix is often termed latent se- “List of biology topics” as the top hubs.xii An- mantic analysis, particularly if singular value de- other approach builds a normalized matrix from composition is the decomposition method. For as- the adjacency matrix of the within-wiki links be- sessing the performance of newly developed algo- tween the articles. By augmenting the normalized rithms Reh˚uˇrekconstructedˇ a document-term ma- adjacency matrix with an extra term the so-called trix from the entire English Wikipedia with the re- Google matrix can be formed. The first eigenvector sulting size of 100,000 times 3,199,665 correspond- associated with the Google matrix determines the ing to a truncated vocabulary on 100,000 words and PageRank of an article.135 The adjacency matrix almost 3.2 million Wikipedia articles.151 Analyses may be transposed, normalized and augmented. Its of document-term matrices can form the bases for first eigenvector may be found to yield what has content-based filtering methods and recommender been called the CheiRank,147 — a concept closely systems, e.g., a system applied Latent Dirichlet Al- linked to Kleinberg’s idea of hubs in his HITS al- location on the 4’635 sized 2007 Wikipedia Selec- gorithm136 and the analysis of Buntine. PageRank tion for Schoolsxiii dataset, potentially providing an and “CheiRank” may be combined to form what alternative method for school children to navigate has been called the 2DRank.136 In the family of this corpus.152 these analysis Bellomi and Bonato wound in an The category network forms another dataset. analysis from 2005 find, e.g., United States, , Chris Harrison has made large “clusterball” visu- Russia, World War II, Jesus and George W. Bush alizations of the three levels of Wikipedia category on the top of the HITS authority and PageRank pages. Another group of his Wikipedia visualiza- lists.137 Longitudinal analysis of these measures tions, WikiViz, display millions of vertices. have also been performed.148 Several researchers have considered the graph/- Barcelona Media researchers performed wikilink matrix formed when editors co-edits articles, i.e., network analyses of only the biographic articles the articles-by-editors matrix or co-authorship of Wikipedia, — but across 15 different language network.138, 153–156 In an 2006 analysis of the versions. They found that PageRank for the En- articles-by-editors matrix constructed from of the glish Wikipedia put America presidents highest on Danish Wikipedia using non-negative matrix fac- the ranked list. Comparing betweenness central- torization on the (12774 × 3149)-sized matrix I ity across language versions showed overlaps among found co-edited pattern on: Danish municipalities, the top five most central persons with American Nordic mythology, Danish governments, article dis- Presidents often in the top five, as well as singers cussions, churches, science , pop culture, and World War II figures. However, cross-cultural years, Scania, sport, countries and cities, plants differences also existed, e.g., with Chinese leaders and animals. As the edits are distributed unequal central in the Chinese Wikipedia, Latin American with respect to number of edits, both over articles revolutionaries central in the Spanish and Catalan and users, the results from the matrix factoriza- Wikipedias, and Pope John Paul II central in the tion depended on the normalization of the matrix. Polish and . A gender gap existed With an unsuitable normalization identified co-edit as only three women appeared among the top five cluster may simply reflect the contribution of sin- central persons across the 15 languages: Elizabeth gle prolific author. With suitable normalization the II, Marilyn Monroe and Margaret Thatcher.149 factorization shows that articles and editors may be Yet another wikilink analysis restricted the anal- characterized by their co-editing patterns.154 An- ysis to Western philosophers—only analyzing 330 other work on the article-by-editor network used an pages and 3706 links. They used the Prefuse toolkit algorithm to identify maximal bicliques, showing for network visualization and also employed graph that some of the dense clusters of articles and ed- simplification under the name “Strongest Link itors were related to controversial topics (Scientol- Paths”. With the influence network constructed ogy, ) or projects (WikiProjectElements from “influenced by” and “influenced” fields of the for elements in the periodic table).155 Laniado et on the philosophers Wikipedia pages they al. reported a large number of network metrics showed that Kant had influenced the largest num- for the co-authorship network over time and top- ber of other philosophers listed in Wikipedia.150 ics after the main contributors for each article had Instead of working from the links, the words of been identified with a method by Adler et al.,138

xiiModel parameters for topic ”ID# 3” xiiiSee http://schools-wikipedia.org

16 e.g., even after the thinning of the co-authorship as the most frequent on the dimensions of well- network they found that over 96% of the contribu- sourcedness and completeness, while they rated tors are connected in the giant component and with neutrality and readability higher. removal of 5000 contributors with the highest be- tweenness centrality the authors 85.9% author were Hoax content still connected in the component. The analysis of the co-editing pattern has also been combined with Bloggers have made experiments with Wikipedia geographic resolving of editors using natural lan- adding factual errors to observed how fast they guage processing of user pages.156 get corrected — if at all.161, 162 How long the A sequential collaboration network is a net- hoax survive may depend on how obvious it is. work where users are represented as nodes and links P.D. Magnus made vandalisms anonymously and between one editor and another are present if the from various IP addresses across separate philoso- second editor edits an article right after the first ed- phy articles, e.g., the edit ‘Kant’s poetry was much itor edited the article.157 The network is a directed admired, and handwritten manuscripts circulated graph and may be said to be a social network, as among his friends and associates’ in the ‘Immanuel the adjecency matrix is a user-by-user matrix, al- Kant’ article. After 48 hours 18 of 36 of this kind beit the editing may only be a form form of indirect of vandalism remained.163, 164 Danish computer collaboration. Some researchers call it the “article science student Jens Roland had his short article trajectory”.158 Several different patterns might oc- about the fictitious municipality Æblerød surviving cure when the sequential collaboration network is 20 months on the Danish Wikipedia and translated visualized for different articles. Iba et al. name into several other language versions.165 The scam “snake” “wheel” and “star” as “egobooster” net- was supported by external web-sites with informa- works.157 An example of a sequential collabora- tion that looked genuine. For the cases of the bib- tion network is displayed in Figure3 for an article liographies on John Seigenthaler166 and Bertrand of a Danish Politician on the Danish Wikipedia. Meyer167 that drew much media attention the van- Other applications of sequential collaboration net- dalism remained for 132 and four days, respectively. works have appeared for an online service on multi- In both these cases the false statements made could ple business-related articles that adds color-coding possible have let to criminal charges against the based on .159, 160 author. The English Wikipedia maintains a list When networks are formed from the reply net- of Wikipedia hoaxes (the page Wikipedia:List of work of article discussion/talk pages or the activ- hoaxes on Wikipedia) showing that complete hoax ity on the user talk pages (user talk network and articles may persist for many years. wall network) the networks better represent the Perhaps the most elaborate single incident con- ‘real’ social interaction among users in contrast to cerned the article for the fictitious “Bicholim con- indirect stigmergic collaboration seen in the net- flict”, which persisted for well over 5 years. It was work formed from articles co-authorship. Laniado even rewarded with “Good Article” status, cast- et al. construct such networks and identify the pres- ing serious doubts on the article reviewing process ence of welcomers, — the users and bots writing a on the English Wikipedia. This article was sup- welcome message on new users’ talk pages, as well ported by citations to fictitious books.168 These as reports on the user associativity. They also show kinds of hoaxes resemble the kind you also find in that long chains of replies between users on article ordinary reference works, e.g., “Dag Henrik Esrum- talk pages appear for articles such as “Intellligence Hellerup” — a fictious entry in a music reference design”, “Gaza War” and “Brack Obama”.141 work. T. Mills Kelly teaches the course Lying About the Article feedback Past at George Mason University where students fabricate histories and add them to Wikipedia. In In September 2010 Wikimedia launched an exper- 2008 the tale about Edward Owens survived un- imental article feedback tool letting users evaluate til Kelly announced the hoax at the end of the articles as well-sourced, neutral, complete or read- semester, while a 2012 experiment on the fabricated able on a five-point scale. Results of an analysis history of serial killer Joe Scafe survived 26 minutes presented by Howie Fung of 1,470 ratings across after an announcement on Reddit.169 289 articles showed that registered users tended to Most of the hoaxes seem to arise from humor or give lower ratings than unregistered users. The critical examination. When prolific and barnstar- maximum score came as the most frequent given awarded editors insert false information it may re- score among unregistered users on all four dimen- quire a major cleanup, such as in the case with sions. Registered users chose the minimum score user Legolas2186. On the English Wikipedia he

17 Figure 3: Sequential collaboration network157 for Danish politician (Ellen Trane Nørby) on the Danish Wikipedia generated with my revvis online tool on the Toolserver. The size of the nodes are determined by the number of edits of the user. The article begins with the 213 IP address creating the article (upper left corner of plot) followed by an edit by the 212 IP address. At one point a Danish parliament IP (194.255.119.1) edits several times. was blocked indefinitely after inserting false state- periods in October and November 2005 to see how ments that was supported by citations to invented vandals were dealt with.174 During the time peri- articles.170 His case seems not to originate in hu- ods he noticed hundreds of reports of vandalism, mor or critique. Should such a case be explained as while not all were reported vandalism were in fact a kind of addiction to barnstars and edit counts? vandalism. He reported 16 false reports and 39 user bans in the time period. He also discussed the issue Vandalism of subtle vandalism escaping detection. Among the temporal aspects examined in a 2006 For obvious vandalism, where large part of an arti- study is the number of reverts through time from cle is deleted, a 2004 study showed that it typically 2002 to the beginning of 2006. The researchers only took a couple of minutes before an article gets found that the number of reverts, including fast re- reconstructed.171 With data up to 2006 Reid Pried- verts have risen almost monotonically from below horsky and his co-authors found that 5% “dam- 1% to over 6%, and “that may signal an increas- aged” edits among 57.6 million revisions 42% of ar- ing amount of vandalism per page”. An exception ticle damages were reverted within one estimated on the monotonicity was at the introduction of the page view.172 Scientists have observed a lower rate 3-revert rule established in November 2004 which of vandalism on specialized scientific topics: The almost halved the so-called double-reverts.175 edits on Wikipedia articles related to the WikiPro- The communal ‘recent change patrol’ watch ject RNA and the Rfam database reach a vandalism over the recent changes in Wikipedia on an en- rate of slightly over 1% (actually “possible vandal- tirely voluntary basis and edits or deletes vandal- ism”).173 ism. Wikipedians have contructed many tools, In his ethnographic study Lorenzen observed the with names such as WikiGuard and WikiMoni- Wikipedia:Vandalism in Progress page in two time , for monitoring and semi-automated editing and

18 vandalism reversion. On http://www.wpcvn.com spot, biased editing comes usually from competent Dmitry Chichkov constructs a dynamic web page editors and biased edits may be harder to judge. To that displays an overview of recent changes along boost their reputation organizations or individuals with extra useful information such as editors may make such edits or they may pay others, e.g. ‘karma’. More advanced tools does reversion au- companies, to do it: Either deleting tomatically. An example is ClueBot.176 The negative information or add positive. Law Profes- early tools were mostly rule-based applying sim- sor Eric Goldman predicted in a 2005 blog post that ple heuristics, but vandalism detection may also in five years Wikipedia would fail due to “gamers” be viewed as a machine learning problem where and “marketeers”.182, 183 In 2013 Wikipedia is still the task is to classify an edit as a vandalism or fairly neutral, but there has been several cases of not.177–180 Geiger and Ribes give a narrative of biased edits. the reverting and blocking of a vandal using a com- One early reported case from 2007, that exem- bination of assisted editing tools and bots showing plifies the problem, had offering to pay how multiple Wikipedia editors as vandal fighters blogger Rick Jelliffe to change Wikipedia articles use Huggle and Twinkle and how the tools inter- on open-source document standards and the Mi- act together with ClueBot over a fifteen minute pe- crosoft rival format. The company was sure that riod.181 They also show the role of user talk pages the articles contained inaccuracies.184 Controver- in signaling warning messages to the user and to sial edits from an IP address associated with pod- other vandal fighters and their tools. The presently casting entrepreneur Adam Curry in the “podcast- operating ClueBot NG bot automatically reverts ing” Wikipedia article have also been noted.172, 185 much simple vandalism very fast. Semi-automated The investigation of biased edits became consid- anti-vandalism tool STiki have been reported to erably easier when American PhD Student Virgil have a median age of reversion on approximately Griffith created and in August 2007 opened the 4.25 hours.179 web-based tool called Wikiscanner. It would merge Dmitry Chichkov has also reported descrip- IP address information with Wikipedia anonymous tive statistics on the most reverted articles.xiv edit information.186 The tool pinpointed edits from He has made statistics sorted on revert ratio organizations, which were exposed in mainstream and filtered for number of revisions larger than media. Some of these edits were harmless spelling one thousand available from wpcvn.com/enwiki- correction, others graffiti vandalism, yet others re- 20100130.most.reverted.txt. Sexual and vulgar ar- moval of negative information. Examples of the ticles tend to get high on the list, e.g., 69, Nipple, latter category are: Edits from the Vatican on Urine, Butt and Pussy, but also Geometry, Bubonic Sinn Fein leader Gerry Adams removed links to plague and Italian for some . news stories that alleged a link to murders in 1971 Based on categories by Vi´egaset al.171 Pried- and edits from voting machine supplier Diebold horsky et al. worked with the following 7 categories removed unfavorable information about political (features) of Wikipedia damage/vandalism: Mis- funding and alleged rigging of the 2000 United information, mass delete, partial delete, offensive, States election.187 BBC edits rewrote a part on spam, nonsense and other.172 Having three inde- political correctness of the ‘Criticism of the BBC’ pendent judges and using a majority vote proce- article.188 The Wikiscanner has even revealed ed- dure they found 53% of damages with the nonsense its from the public relations firm Hill & Knowl- feature, 28% with the offensive feature, 20% mis- ton to the Politics of the Maldives Wikipedia ar- information, 14% partial delete and the rest of the ticles. The firm was employed in 2004 by the for- features occuring in less than 10% of the damages. mer President to improve the image of the coun- In their discussion they noted that mass deletion is try.189 The edits removed words such as “torture”, easy to detect while the more often occuring feature “propaganda” and “censorship”. After exposure of misinformation “may be the most pernicious form Wikipedia articles the Dutch justice ministry would of damage”. temporarily block its 30.000 employers from using Wikipedia at work.190 A Wikiscanner, constructed Biased editing by Peter Brodersen, is also available for the Danish Wikipedia.191 An extension of Wikiscanner using In biased editing editors diverge from the neutral trademark and Ip2location databases would com- point of view of Wikipedia, — one of its corner- pute link distance between pages and categories and stones. It may be associated with conflict of inter- were reported to detect “the majority of COI ed- est. Whereas graffiti vandalism is relatively easy to its”.192 Biased edit stories continues to pop up in xivCode for Chichkov’s method is available at mainstream media from time to time. In 2013, e.g., http://code.google.com/p/pymwdat/ a user from City of Melbourne edited in the Occupy

19 Melbourne article,193 and the staff of Danish politi- book group by DiStaso200 and the Wikipedia arti- cians were found to edit Danish Wikipedia articles cle about the Facebook group. Chartered Institute on their politicians. As troublesome Wikipedia con- of Public Relations has established best practice tent Andrew Lih would also point to university ar- guidance for public relation professions, and David ticles with “glowing accolades, trumpeting selective King from a firm offering Wikipedia consulting ar- accomplishments in the opening paragraphs as the gues for “transparent community collaboration”.201 most significant, noteworthy events, even if taken Several companies offer paid editing, with one even from obscure sources.”194 claiming to have among In 2011 a French court ruled that a company had the editors that can help customers with changing to pay its competitor company damages after some- Wikipedia. The journalist Simon Owen gained con- one from the former company had edited the Mi- tact with one of their clients, who said that “they cropaiement article on Wikipedia.195 paid between $500 and $1,000 to have the page cre- A user may create multiple accounts, and if used ated, then an additional $50 a month afterwards for for deception Wikipedia editors speak of ‘sockpup- ‘monitoring’”.197 pets’. By their multiplicity sockpuppet accounts Does the demographics of Wikipedia can create a sense of consensus, bias polls and contributors—the prototype being the young work around Wikipedia’s three-revert-rule for fight- male procrastinating graduate students—bias ing other editors, — an instance of so-called ‘gam- the content of Wikipedia? In 2013 American ing the system’. Through text analysis, looking for writer Amanda Filipacchi202 and editor Elissa repeated spelling errors and idiosyncracies in sen- Schappell203 noticed that the English Wikipedia tence construction, socket puppets may be identi- subcategorized the “American Novelists” cate- fied.196 One sockpuppet was exposed after another gory into “American Women Novelists” but not user examined the irrelevant references in an ar- “American Male Novelists”. Filipachi would call ticle about a company of questionable notability. this “Wikipedia’s Sexism”, and one could hold In the ensuing deletion discussion, where several the view that relegating women to a subcategory other “users” defended the company article, suffi- would lower the visibility of female writers. The cient suspicion arose to warrant a sockpuppet inves- case would be noted by several media outlets,xv tigation, that would eventually uncover a network and lengthy discussions ensued on Wikipedia. of 232 confirmed sockpuppet accounts. These ac- Apart from the discussion of sexism, Wikipedia counts predominantly edited in articles about com- contributors also discussed more generally about panies and persons being promotional in nature.197 subcategorization, particularly about subcate- The Wikiganda, a student project mentored by gorization based on ethnicity, gender, religion Virgil Griffith and at one point available from and sexuality. Before the “American Women www.wikiwatcher.com, made use of automated text Novelists” case there had, e.g., been discussions analysis to detect biased edits. The opinion mining about the category “Male actors by nationality” and sentiment analysis technique used a lexicon of (and its subcategories) in 2012.xvi The result of over 20.000 words from General Inquirer and Wiebe the discussion was to keep the subcategorization, wordlists so each revision could get a ‘Propaganda and as of July 2013, e.g., the “Danish male actors” Score’ and labeled as negative, positive or ‘vague’ (of November 2012xvii) persists but has been propaganda. Also using the WikiTrust system and applied far less widely than “Danish actresses” evaluating against 200 manually labeled revisions (also of November 2012xviii). Wikipedia maintains the system showed a precision/recall performance guidelines for categorization by ethnicity, gender, on 52%/63%.198 Our similar online system focused religion, or sexuality and a version reads:xix“Do on business-related Wikipedia edits.160 not create categories that are a cross-section of A special case of biased editing happened a topic with an ethnicity, gender, religion, or with Gibraltar and high profile Wikipedian Roger sexual orientation, unless these characteristics Bamkin. Bamkin did not change content per se, but rather promoted Gibraltar-related content in xvCategory talk:American novelists on the English the ‘Did You Know’ section of the main page.199 Wikipedia. Look under “mentioned by multiple media orga- Wikipedia has the lengthy “Conflict of inter- nizations:” xviCategory:Male actors by nationality on the page est editing on Wikipedia” article, and a Facebook Wikipedia:Categories for discussion/Log/2012 November 22 group called Corporate Representatives for Ethi- in the English Wikipedia. cal Wikipedia Engagement (CREWE), established xvii xviii in the beginning of 2012, points to articles and Category:Danish actresses: Revision history on the En- glish Wikipedia members discuss the issues around conflict of in- xixWikipedia:Categorization/Ethnicity, gender, religion terest edits. Read also the description of the Face- and sexuality on the English Wikipedia.

20 are relevant to the topic.” and “In almost all eling deletion by administrators are considered to cases, gendered/ethnic/sexuality/religion-based represent good deletions.205 categories should be non-diffusing, meaning that Authorship determination has importance for re- membership in the category should not remove distribution of Wikipedia: Editors release their membership from the non-gendered/non-ethnic/etc Wikipedia contributions under a share-alike license parent category.” The end of the American nov- and this license requires that everyone copying or elists subcategorization controversy is as of July distributing Wikipedia lists ‘at least five of the prin- 2013 that no individual pages are grouped there cipal authors of the Document’. A strict interpreta- and writers are all moved to either “American tion of this requirement call for a list of Wikipedia women novelist” or “American male novelist”. editors, e.g., in copies of Wikipedia articles on other web-sites. A quick look makes it evident that redis- Authorship tribution web-sites do not usually honour this strict intepretation. A page will almost always have multiple authors. Dynamic social network analysis may be used to The revision history records each author contribu- examine editor contribution and role. Network vi- tions, but the format of the revision history makes sualization may identify “egoboosters” that relent- it not trivial to determine who contributed what lessly checks and edits an article.157 and the most, since text may be reformulated, moved, deleted and reintroduced. User contributions To get an overview of the edits the convenient program history flow takes the revision history as Wikipedia authors contribute with varying amount input and visualizes the entire history of an article of work. A relatively few number of editors make with colorings determined by author.171 Another the great percentage of the number of edits, e.g., related tool—WikiDashboard 204—also generates a Jimmy Wales reported that ‘half the edits by logged visualization of the edit activity of each Wikipedia in users belong to just 2.5% of logged in users’ based 206 page. It embeds the generated plot in a proxy copy on a count from December 2004. A study based of the Wikipedia article showing the amount of ed- on a 2007 dump of Wikipedia found that 46% of 207 its of each author through time for the given ar- registered users made only one edit. In 2008 ticle. Yet others have presented tools with exam- Ortega et al. would quantify the contributor in- ples of color coding of text depending on authorship equality for registered users with the Gini coef- — so-called blaming tools.xx Author color cod- ficient across several Wikipedias: It was between 208 ing (‘Einf¨arben’) operates in full in the German 0.924 and 0.964. competing encyclopedia Wikiweise. Yet another Are the many edits of elite editors of major value? tool, WikiBlame, helps in identifying versions of Perhaps the elite editors are just fixing, e.g., minor a Wikipedia article where a phrase occurs. The comma errors. Based on a small investigation of a MediaWiki extension Contributors just counts the few Wikipedia articles argued that number of edits. the occational contributor is actually the one that These approaches do not reveal if editors just make substantive edits. Swartz had looked on the copied text from other articles in Wikipedia. “Alan Alda” article and results from “several more The detection presents a challenge that other re- randomly-selected articles” showed the same pat- 209 searchers also ignore:172 For a full examination of tern. the originality of an entry one would need to go A 2007 study considered the “impact through the entire revision history up to the date of an edit” taking the number of page views into of the entry of all articles in Wikipedia. The same account with a metric the authors called persistent 172 problem arises with translations: An entry may be word view (PWV), and after some discussions novel across language versions or ‘just’ a transla- on the Wikipedia research mailing list in November tion. 2008 Alain D´esiletssummarized with the following xxi A ‘good’ deletion, e.g., of vandalism or poorly assertions: written entries, increases the value of a Wikipedia • Most edits done by a small core article, so to which extent should a deletion count in the rank of authorships? In user reputation mod- • But, most of the text created by the long tail

xxUser ‘Jah’ on the German Wikipedia has made a Perl • However, most of the text that people actually script for color coding available at http://de.wikipedia.org- read, was created by the small core /wiki/Benutzer:Jah/Hauptautoren. and others display examples: http://mormegil.info/wp/blame/AFC Ajax.htm xxihttp://lists.wikimedia.org/pipermail/wiki-research-l and http://hewgill.com/˜greg/wikiblame/AFC Ajax.html. /2008-November/000697.

21 Figure 4: The program history flow can visualize the evolution of an article and easily give an overview when major changes occur: The visualization shows an anonymous user making a major extension of the Danish Wikipedia article about Bobby Fischer on April 20th 2005. However, ‘Zorrobot’ makes the most changes. This Norwegian program automatically updates links to Bobby Fischer articles in other language versions of Wikipedia.

The Minnesota study had found that by July 2006 User characteristics more than 40% of the persistent word views where made by users in the top 0.1%.172 A Pew Research Center’s Internet & American Life Project survey among adult Americans based on Counting of anonymous, bot and sock puppet telephone interviews found close to 80% were In- editing as well as the name spaces investigated may ternet users. Among these Internet users education confound such conclusions: Are anonymous edits level, age and home Internet connection type were infrequent editors or a registered user not logged the strongest predictors for Wikipedia use. House- in? Is a specific IP-number a network proxy merg- hold income and race/ethnicity were also predictors ing multiple authors? Are some of the edits actually for Wikipedia use, while gender was not a strong automatic edits from an unregistered bot? predictor: 56% of male and 50% females Internet users used Wikipedia.4 Bots and edits via assisted editing tools make up In April 2011 the Wikimedia Foundation sur- a considerable part of the total number of edits. In veyed Wikipedia editors with the results pub- 2007 Stuart Geiger found that the number of edits lished in a 75-page report and they reported sev- of this type is larger than the number of edits made eral user characteristics for the over 5’000 respon- by anonymous editors. On one particular page such dents:213 One question is whether one can claim edits made up 75% of the total edits,181, 210 see also that Wikipedia is built by mostly non-experts or page 37. in the words of Andrew Lih ‘a bunch of nobodies’. For the highest level of education 8% survey respon- Even in dedicated teaching wikis where students dents reported a PhD while 18% reported a Master. are given writing assignments the level of contribu- A large portion of the editors feel proficient with tion may be highly variable.211, 212 computers with 36% reporting that they program &

22 create their own applications. In an interview study among 32 contributors to health-related Wikipedia articles 15 (47%) worked in a health-related fields and mainly as clinicians.214 Wikipedia contributors have a sizable gender gap. Quite a number of surveys have addressed the issue: A 2015 blog post listed 12 surveys.215 Among the surveys is for instance the Wikipedia Survey published in 2010. It found that only 13% of contributors were female,216 Another survey, the Figure 5: Religions of Wikipedians as analyzed Wikimedia Foundation editor survey of 2011, found by WikiChecker based on the subcategories of 9% of the respondents to be females.213 The Media- the Wikipedians Category. c WikiChecker.com. allows users with an account to set GFDL. their gender in the preferences. Robin Peperman’s Toolserver application ‘User preference statistics’ aggregated the statistics across users, e.g., on the to answer this question subjecting 139 Wikipedi- Danish Wikipedia it showed that in February 2012 ans and non-Wikipedia to a personality question- 3’153 users declared themselves male and 677 fe- naire.222 In the study Wikipedians scored lower on male, i.e. 82.32% and 15.68%, respectively. Lam et ‘Agreeableness’ and higher on ‘Openness’. Scores al. used the user preference setting statistics on the on ‘Extroversion’ and ‘Conscientiousness’ person- English Wikipedia and also userboxes on user pages ality dimensions depended on the sex of the sub- specifying gender. They found ratios in a similar ject. Based on previous research they hypothesized range: Around 13% for the userbox method and that Wikipedians would score lower on extrover- around 16% for the preference setting method.217 sion, but their results indicated that only female Furthermore, they found that the ratio has re- Wikipedians would score lower. The authors won- mained more or less constant in the time window dered why Wikipedians scored lower on agreeable- they looked at (2006–2011). The reason surveys ness, suggesting that contribution to Wikipedia as have reported somewhat lower ratios than the user an apparent prosocial behavior, links to egocentric preference setting statistics may be due to survey motives such as “personal expression, raising self- non-response bias, and Hill and Shaw argue that confidence, and group identification”.222 the survey ratios should be around 25% higher.218 One sees the approximately 85-to-15 percent ra- One of the major dichotomies of user character- tio in public thought-leadership forums as moni- istics with respect to Wikipedia editing behavior tored by The OpEd Project.219 However, the com- is concerning inclusionists and exclusionists/dele- munity on free and open source software (FOSS, tionist, — inclusionists being contributors believing “that Wikipedia should contain pretty much any- sometimes Free/Libre and Open Source Software, 25 FLOSS) has a much large gender gap: a survey thing, as long as it’s factual and verifiable”, while found only 1.1% females in the sample.220 Vari- exclusionists being more stringent on what go into ous reasons for the gender gap can be put forth, the encyclopedia. e.g., that the Wikipedia editing is somewhat tech- With the use of infoboxes or explicit categoriza- nical and that women are ‘scared away’ because tion on their user page users may manifest that of hostile discussions in the online community of they belong in one or more user categories. There Wikipedians. Indeed the Wikimedia Foundation is a sizable network of user categories, for the En- editor survey of 2011 found that, e.g., 4% of the glish Wikipedia, see the subcategories from the top responding female editors reported ‘I was stalked at Category:Wikipedians. Some users explicitly online’.213 declare themselves as, e.g., inclusionists or Hindu For the WikiWomenCamp May 2012 Laura Hale Wikipedians. The www.wikichecker.com website and others summarized the involvement of women made analysis results available of some of the user in the different Wikimedia projects across coun- categories across a number of language versions tries, see WikiWomenCamp/FAQ/Perspectives ar- of Wikipedia. Among the results one finds that ticle on the Meta-wiki. of 5’973 users manifesting their beliefs in total on A 2007 study provided evidence that young pa- the English Wikipedia 7% declare themselves to be xxii tients with schizotypal personality disorder would Jewish and 38.9% to be Atheists (see Figure5), use the Internet for social interaction significant while 20.1% of 4,433 manifesting their editing phi- more than controls.221 Does Wikipedians have a special personality type? An Israeli study sought xxiihttp://www.wikichecker.com/religion/, February 2014.

23 losophy declare themselves to be inclusionists.xxiii southern Scandinavia and parts of Eastern Europe, i.e., showing a geographically uneven coverage of 223 Geography Wikipedia. An IP number can be resolved to a geographical Organization and social aspects location (“geo lookup”). Wikimedia data analyst Erik Zachte used information from server logs, con- How is Wikipedia organized? In a critical news verted it to a geographical coordinates and con- comment from 2008 Seth Finkelstein character- structed a dynamic visualization of the edit pat- ized it as “a poorly-run bureaucracy with the tern through time with world map background.xxiv group dynamics of a ”, an “oligarchy” and There are websites with similar dynamic visual- having “an elaborate hierarchical structure which ization. L´aszl´o Kozma’s WikipediaVision runs is infested with cliques and factional conflicts”.224 from http://www.lkozma.net/wpv/. Wikipedia Re- Researchers describe Wikipedia and its relation cent Changes Map was built by Stephen La- to concepts such as “community”, “social move- Porte and Mahmoud Hashemi and available from ment”, “benevolent dictator” and “network so- http://rcmap.hatnote.com in May 2013. It cap- ciality”.38, 225, 226 Several studies refer to a “core tured edits from unregistered users on different lan- group of dedicated volunteers” or “critical mass”, guage versions of Wikipedia in real-time and dis- and Wikipedia has been regarded as governed played them on a map. While these Web services by a so-called “benevolent dictator” or “constitu- focused on recent changes edits Federico Scrinzi, tional monarch” (Jimmy Wales).19, 35, 225, 227 The Paolo Massa and Maurizio Napolitano’s Wiki Trip new information and communication website would display geo-resolved unregistered ed- have been regarded as creating “network social- its based on data from single Wikipedia articles ity” rather than a “community”, — not based on through time with an OpenStreetMap map as back- a common narrative and eroding enduring relation- ground. ships.226 With access to page view data, Erik Zachte con- Papers compare the Wikipedia development struted the WiViVi service with a map of page view model with FOSS development.228, 229 In his clas- with respect to Wikipedia language version.xxv sic text on FOSS development, The Cathedral and 230 Students from the Technical University of Den- the Bazaar, Eric Raymond came up with the mark constructed a Web service that would down- concept of bazaar-type development as opposed to load the blocked IP addresses information from cathedral-type development, with bazaar develop- Wikipedia (or any other MediaWiki installation) ment characterized by an open development pro- and after a geo lookup render the geographical dis- cess making early and frequent releases of the de- tribution of the blocked IPs with a temporally static veloped code. Wagner has compared Wikipedia heat map visualization. work with the bazaar approach and argues that By examining the user page with natural lan- Wikipedia fit many of the features of bazaar-style 228 guage processing techniques researchers determined development. On the other hand Magrassi ar- the country associated with a user. They fur- gues that FOSS requires more top-down coordina- ther examined geotagged English Wikipedia arti- tion than Wikipedia because that software needs to cles associated with the Middle East and North be coherent while Wikipedia can have low coher- Africa (MENA) region and found that Western- ence: A Wikipedia article may still be good even based editors formed the majority of contributors if other articles are of low quality or non-existent. to the MENA articles.156 Another investigation of In major FOSS systems one may see a hierarchical the geotagged articles showed that across 44 lan- structure with a “benevolent dictator” or “core de- guage versions of Wikipedia more than half of the velopers” on the top, followed by “co-developers” 3’336’473 geotagged articles had a geotag inside a and ordinary developers, active users and passive 229, 231 small circle emcompassing all of western Europe, users. For the “development” of an article on Wikipedia this is not necessary. xxiiihttp://www.wikichecker.com/editing-philosophy/, Wikipedia may meet face-to-face25, 35 and February 2014. Wikipedians can coordinate work in . xxivThe visualization was available from http://stats.wikimedia.org/wikimedia/animations/requests/- However, most of the construction process of AnimationEditsOneDayWp.html and a Wikipedia takes the form of an indirect collabora- blog post had background information tion, where individual members in a group make http://infodisiac.com/blog/2011/05/wikipedia-edits- small changes to a shared structure inspiring other visualized/ xxvhttps://stats.wikimedia.org/wikimedia/ to improve it even further, — to paraphrase Peter animations/pageviews/wivivi.html Miller.232 Indeed researchers have frequently in-

24 voked Pierre-Paul Grass´e’sconcept of stigmergy— Project has conducted surveys on Wikipedia use originating in the study of animal behavior—and in 2007 and 2010. In May 2010 42% of adult applied it on the Wikipedia process.25, 194, 232–235 Americans used Wikipedia. This was up from Mark Elliott would argue that “collaboration in 25% in February 2007.4 At 24% Wikipedia is by small groups (roughly 2–25) relies upon social ne- far the most used educational and reference site. gotiation to evolve and guide its process and cre- In 2007 Yahoo Answers was second (4%), while, ative output” while “collaboration in large groups e.g., was on 1%.3 In 2010 use of (roughly 25–n) is dependent upon stigmergy” ad- Wikipedia among Internet users was more popu- mitting that in the stigmergic collaborative con- lar (with 53%) than instant messaging (47%) and text of Wikipedia social negotiation can take place rating a product, service or person (32%), but less (e.g., article discussion, email, ), popular than using social network sites (61%) or but take a secondary role. He also sees stigmergic watching online videos (66%).4 wiki collaboration as distinct from “co-authoring”, Another way of capturing statistics related to where his idea of “co-authoring” consists of social usage is through the Google Trends Web ser- negotiation in the creative gestation period.234 vice (http://www.google.com/trends) that displays Wikipedia has seen an increasing institutionalisa- search volume for the Google search engine through tion with Wikimedia Foundation, Arbitration Com- the years based on a query, see Figure7. Users use mittee and Association of Members’ Advocates.19 “wikipedia” and “wiki” to refer to Wikipedia, but The Wikimedia Foundation has the primary role “wiki” may also be used to refer to other wikis re- of running the computer that serves Wikipedia and lated to, e.g., Minecraft, Naruto and Skyrim. There its sister projects. Other organizations may also have been steady declines in the “wikipedia” search affect Wikipedia. The transition from GFDL to volume since the middle of 2010 and for “wiki” since CC licence involved the Wikimedia Foundation, september 2011. The declines are not necessary an the Foundation (FSF), Wikimedia- indication of a decline of the use of search engine wide voting and Wikimedia Foundation Board of to access information on Wikipedia or other wikis. Trustees decision. FSF also acts as a license stew- Given that Wikipedia articles rank high in search ard.236 engine results, one may speculate that Internet Other issues that the research literature has dis- users have come to be expect this and not finding cussed are privacy237 and legal aspects ( li- it necessary to explicitly mention Wikipedia in the cense).236 search query. The Google Trends has also be used to collected data related to individual Wikipedia articles. Gene Wiki researchers correlated Google Popularity Trends and Wikipedia article views for genes.240 A number of companies, such as comScore and An analysis of the Arabic blogosphere identified Alexa, collect usage statistics from Web users, and the English Wikipedia as the second most often such statistics allows Wikipedia to be compared to linked to site from Arabic blogs and only surpassed other Web sites. The September 2006 traffic got by YouTube. After Al Jezeera, BBC and Flickr Wikipedia Sites listed as number sixth on com- the ranked sixth in received cita- Score’s worldwide ranking,238 and in October it tions.241 broke into U.S. comScore top ten.239 Alexa put Several researchers have investigated the Internet wikipedia.org as number 7 in traffic rank in Septem- search ranking of the articles of Wikipedia.242–245 ber 2009.xxvi stats.wikimedia.org/reportcard re- Using search engine optimization techniques two re- ports the temporal statistics from comScore statis- searchers investigated the Google ranking of the tics, and it, e.g., shows that throughout most of English Wikipedia for health topics. The queries the first half of 2010 Wikipedia had 350 million were 1726 keywords from an index of the Ameri- unique visitors and around 8 milliard page requests. can MedlinePlus, 966 keywords from a NHS Direct In January 2016, Wikimedia Foundation chosed to Online index and 1173 keywords from an American stop using the comScore statistics as they believed index of rare diseases (U.S. National Organization the data were “no longer fully representative” of of Rare Diseases) and compared Wikipedia to .gov their traffic.xxvii domains, MedlinePlus, Medscape, NHS Direct On- Pew Research Center’s Internet & American Life line and a number of other domains. They found the English Wikipedia as the Web site with most xxvihttp://www.alexa.com/siteinfo/wikipedia.org top rankings.242 A Wikimedia page continuesly record Wikipedia’s Internet search engine ranking on in- and comment on the Alexa statistics: http://meta.wikimedia.org/wiki/Wikipedia.org is more populardividualthan... queries may depend on the type of query xxviihttps://meta.wikimedia.org/wiki/ComScore/Announcement(whether it is navigational, transactional or infor-

25 Figure 6: With the rise of its popularity Wikipedia finds its way into comics strips. Here the “” template in Wikipedia used in ’s XKCD. c Randall Munroe. CC-BY.

sites analyzed. However, these social media sites have generally been ranked lower than most Inter- net portals and search engine, such as Google, Bing and Yahoo!, especially for the years 2011 and 2012. Wikipedia has consistently maintained its score on around 77–78, and as the ACSI of Internet portals and search engines dropped in 2013 (perhaps as a result of the leaks of ?), Wikipedia Figure 7: Google Trends for the query would score higher on the index than these web- “wikipedia,wiki”. “Wikipedia” and “wiki” sites. In the more general category “E-Business” are the blue and the red curve, respectively. only FOXNews.com would have a higher score than Wikipedia for the year 2013.246, 247 mational) and the number of words in the query. One study reported in 2012 found that single word A 2012 survey found that almost 50% of 173 Dan- informational queries (example: “lyrics”) placed ish medical doctors used Wikipedia search for infor- 248, 249 Wikipedia on page one of Google for 80–90% of the mation related to their profession. The sur- queries, while, e.g., 3-words transactional queries vey recruited subjects via the online network Doc- (example: “boots for sale”), only placed Wikipedia torsOnly which might have biased the results. on page one in under 10% of the cases. Navigational queries may yield even lower Wikipedia ranking.244 Detailed analysis of the individual articles of the By not distinguishing between types of queries, not English Wikipedia shows that bursts of popular- using “real” queries, but random nouns, another ity on individual pages occur during Super Bowl study had found Wikipedia on 99% of Google’s halftime entertainment and around the death of page one.245 a famous subject, e.g., Whitney Houston, Amy Based on “customer” interviews The American Winehouse and Steve Jobs. Other causes of in- Customer Satisfaction Index (ACSI) has ranked In- creased view of particular pages are the so-called ternet social media in 2010 to 2013 with Wikipedia Google Doodles, denial of service attacks on par- ahead of YouTube and Facebook. In this cate- ticular pages, second screen and effects.250 gory Wikipedia has maintained the highest score Page views on individual articles are highly skewed, through the years 2010–2013 among the 8 web- see Figure8. 250, 251

26 Milliard USD Description 10–30 Market value from comparison with other social media sites. 21–340 Market value based on in- come from potential fee paided by reader subscribers times revenue-value ratio 8.8–86 Market value based on possible revenue 6.6 Replacement cost based on a $300 per article cost 10.25 Replacement cost based on esti- mates of labor-hours and a $50 per hour cost Figure 8: Distribution of page views in the English 0.63/year Estimate for cost with full-time Wikipedia.250, 251 c Andrew G. West. CC-BY-SA. paid writers employed to up- date Wikipedia Economy 16.9–80 Annual consumer value based on received benefit that could How is the economics of Wikipedia? What is its have had a fee economical value? How much would users be will- 54–1’080 Annual consumer value based ing to pay for it? What would it cost to reconstruct on comparison with librarian Wikipedia? In 2013 these issues were considered research question cost in a short review by Band and Gerafi, that con- cluded that the economic value of Wikipedia was Table 6: Economic value of Wikipedia based on an 252 in the range of tens of milliards and consumer ben- 2012 overview by Band and Gerafi. 1 milliard = efits hundreds of milliards,252 see Table6. One 1’000 million. study included in the review estimated the mar- ket value by comparing Wikipedia to other Internet YouTube video Gangnam Style reached 2 milliard companies in terms of reputation and unique visi- views in May 2014 corresponding to 140 million tors. This economic triangulation with LinkedIn, hours, thus the hours used on building all the Twitter and Facebook put the market value of Wikipedias compare roughly with time spend on Wikipedia to between $10 and $30 milliard (United watching the (as of 2014) most popular YouTube States Dollars). Other methods estimate the eco- video. For the double-digit milliard market value nomic value of Wikipedia between $6.6 and $1’080 estimate from advertising it is worth remembering milliard. The extreme maximum value comes from that just a small note about the potential of ad- considering a Wikipedia reader session as a ‘re- vertisement (“ might well start selling ads on search question’ and using a value of a ‘research Wikipedia sometime within the next few months, question’ on $45 from the National Network of [. . . ]”) spawned the Spanish Fork in 2002.25 of . Given the popularity of Edit-a-thons, where (unpaid) editors come to- entertainment and pop cultural phenomenons on gether writing Wikipedia articles, may have a vari- Wikipedia253 and that, e.g., students report using 254 ety of goals, such as writing new articles, extending Wikipedia for ‘entertainment or idle reading’ it or discussing existing, socializing among Wikipedi- seems questionable that the majority of Wikipedia ans, recruiting new editors or increasing general sessions should be regarded as valuable as a 45- awareness. In 2013 Sarah Stierch from the Wiki- dollars research question. media Foundation surveyed a number of these edit- Estimates of replacement costs may use esti- a-thons.xxviii Some of the surveyed program leaders mate of labor hours. Geiger and Halfaker has tracked budget and donations for the event. Stierch put one such estimate forward for data on the found an average edit-a-thon budget on around English Wikipedia up until April 2012 reporting $360, and for the 5 events where she had suffi- a value of a bit over 40 million hours. Extra- cient data she could compute a ‘production cost’ polating to all Wikipedias they reported a num- of around $17 for each ‘printed page’ (equivalent of 255 ber on approximately 103 million labor hours. 1’500 characters). English Wikipedia articles had Another Internet phenomenon may provide com- parison: Korean pop artist PSY’s 4:12 minute xxviiiPrograms:Evaluation portal/Library/Edit-a-thons

27 an average of 3’655 per article in January dia movement. Removing the avoidable technical 2010.xxix Other language versions usually have a impediments associated with Wikimedia’s editing lower count, so a rough estimate of an aver- interface is a necessary pre-condition for increasing age Wikipedia article is between 1 and 2 printed the number of Wikimedia contributor”.265 pages. With $25 per Wikipedia article and 30.8 But why do Wikipedia contributors keep on edit- million Wikipedia articles as the official January ing? The same question may be asked for free 2014 count the replacement cost is only $770 million and open source software programmers working for with unpaid ‘edit-a-thon’ writers, — considerably free. Some researchers view Wikipedia as a pub- lower than any of the values presented by Band and lic good where the contributors “devote substantial Gerafi. In 2014 Stierch also became the source for amounts of time with no expectation of direct com- the price of a ‘Wikipedia page for individual’: At pensation” and a Wikipedian’s participation seems oDesk she reported her personal writing service at “irrationally altruistic”.225 $300,256 — a value that corresponds with the arti- Self-interest may lay behind some contribution: cle charge assumed by Band and Gerafi when they A software program can solve a specific problem for computed the estimated replacement cost of all of its creator, and a Wikipedia entry can organize in- Wikipedias.252, 257 formation relevant for the editor to remember. But The Wikimedia Foundation gets donations Wikipedia editors not only work on information di- throughout the year, and it makes some of the do- rectly relevant for themselves. Surveys among Free nation data available, e.g., from the and Open Source developers mention skill devel- frdata.wikimedia.org. The donation data has not opment as the most important factor for beginning caught a major share of researchers’ attention, — and continuing.37 Once the skills are developed the if any at all. software showcases the skills for potential employ- ers, so the contribution can be seen as a form of self-marketing on the job market.225 For , Why do people edit? Jimmy Wales noted that contributors recognized it What motivates Wikipedians? Wikipedia itself “as a site, where they could learn to become a jour- 124 has a section in the “Wikipedia community” En- nalist”. glish Wikipedia article discussing the motivation It is unclear if Wikipedia contribution has the of volunteers giving pointers to research on the is- same skill signaling value as free software devel- sue. Much of the research in this area is based opment. Andrew George writes that the potential 225 on qualitatively analyzed interviews of Wikipedi- is “simply nonexistent”. Nonexistent is not en- ans.20, 258, 259 However, the area has also seen tirely true: A few Wikipedians have gained tem- mathematical modeling being employed on edit his- porary employment as . tories and survey data.260, 261 Others have con- Such positions typically require thorough experi- ducted ‘field’ experiments on Wikipedians or relied ence with Wikipedia. It is possible that expe- on events for natural experiments.262, 263 rienced Wikipedians may also look forward to a Some Wikipedians mention in interviews that job with paid editing. The controversial company they started editing Wikipedia because they dis- Wiki-PR boasts of having a “network of established covered errors or omissions in an article that they Wikipedia editors and admins” available to edit 266 knew something about.258 Does this means that Wikipedia on behalf of companies. A few people the more correct and complete Wikipedia becomes are professionally employed to work on Wikipedia. the more difficult it will be to attract new editors? On the Danish Wikipedia articles on runic stones The attraction of Wikipedia is in part due to the have been made by professionals as part of a job. In low barrier for entry,258, 264 also refered to as low her survey among 1’284 public relations/communi- opportunity cost.19 Introducing required user reg- cations professionals DiStaso found that 31% of the istration and ’sighted’ or ’flagged revisions’ would respondents had edited in their client or their com- 200 heightened the barrier and possibly result in less pany Wikipedia page. It may be worth to note recruitment of new users. The wiki markup could that now company paid contributions form the ma- form a barrier for entry, and Wikimedians have jority of work carried out on the FOSS Linux ker- hoped that a WYSIWYG editor would increase nel: Only between 13.3% and 16.3% of Linux kernel take-up of new contributors, and indeed a rational changes come from contributors without corporate 267 for developing of the of the so-called VisualEditor affiliation, thus there is a marked difference be- 268 stated: “The decline in new contributor growth is tween FOSS and Wikipedia labor market. the single most serious challenge facing the Wikime- After interviewing 22 volunteer Wikipedians Forte and Bruckman likened the incentive system xxixhttp://stats.wikimedia.org/EN/Sitemap.htm. of Wikipedia to that of the scientific community,

28 where the “cycle of credit” is the most impor- tant aspect in the incentive system and prestige is earned over long periods of time.273 Attribution of authorship may on the surface seems to be a dif- ference between the science world and Wikipedia (authors of Wikipedia articles are not displayed prominently on the page). However, Forte and Bruckman find that through the Wikipedia article editing history contributors recognize one another and that Wikipedians also often claim “ownership” of articles. Another comparison of Wikipedia and academic writing was much more critical: Seth Finkelstein commented on Wikipedia prestige and strongly critized Wikipedia claiming that “it fun- Motivation damentally runs by an extremely deceptive sort of social promise. It functions by selling the heavy 19, 258, 264 Low barrier contributors on the dream, the illusion, that it’ll Wanted to correct errors213, 258 give them the prestige of an academic (‘writing an It solves a problem encyclopedia’)”.274 Self-expression35 Yang and Li attempted to contact a random sample of 2’000 users on the English Wikipedia. Self-efficacy, sense of personal achieve- Receiving 219 valid responses to their question- ment261, 264, 268 naire they used structural equation modeling to Skill development, self-education, future em- describe knowledge sharing behavior in terms of 19, 20, 37, 213, 264, 269, 270 ployment, career intrinsic motivation, extrinsic motivation, external Social status, peer recognition, reputation, visi- self-concept and internal self-concept. They found bility, enhancement19, 35, 225, 262, 264, 268, 270, 271 that internal self-concept-based motivation was the Peer response, get reactions35 most important factor for the knowledge sharing Community identification/involvement/social behavior, and this factor was associated with ques- relations35, 37, 225, 264, 268, 270 tions such as “I consider myself a self-motivated person” and “I like to share knowledge which gives Ideological, Weltverbesserungsantrieb35, 270 me a sense of personal achievement”. On the other 213, 225, 268–270 Altruism, values, volunteering hand intrinsic motivation was found to rarely mo- Money/part of job256 tivate. This factor was associated with questions Creative pleasure, joy, ‘flow’, such as “I enjoy sharing my knowledge with oth- fun19, 20, 37, 213, 225, 269, 270 ers” and “Sharing my knowledge with others gives 261 Protection, reducing guilt and loneliness270 me pleasure”. Other researchers have also dis- tinguished between intrinsic and extrinsic motiva- Obsession, encyclopedic urge, wikipedi- tions.19 M¨uller-Seitzand Reger mentions low op- holism235, 272 portunity costs, reputation among peers, future ca- reer benefits, learning and development and con- Table 7: Explanations (motivations) for wiki work. tributions from the community as extrinsic moti- vations while creative pleasure, altruism, sense of belonging to the community and anarchic ideas as intrinsic. Andrew George proposes the state of ‘flow’ as an intrinsic motivation for Wikipedians, altruism, community identification and peer recognition.225 Based on interviews Rosenzweig noted that many Wikipedians had a passion for self-education.20 Barnstar awards are a form of peer recognition in Wikipedia, and a few studies have analyzed this el- ement with respect to motivation.262, 275, 276 In the experiment of Restivo and van de Rijt reported in 2012 the researchers split 200 productive Wikipedi- ans in two groups and awarded a barnstar to the

29 work for explaning online participation in FLOSS development and Wikipedia contribution (caution- ing that they do not claim that “online partici- pation, at large, would qualify as addiction”).272 Contrasting with antecedent-based explanations, the gateway theory-based argument notes that the initial online participation occurs ‘due to chance’ without prior planning. Evolutionary biological explanations of religion may involve the idea of religious rituals as reliable costly signals between adherents for social commit- ment.279 Should the edit count partially be ex- plained as a hard to fake, costly signal between the Figure 9: Effect of barnstar awards on productiv- adherents of the ‘Wikipedianistic’ religion? ity of Wikipedians: Figure 1 from Restivo and van de Rijt (2012).262 “Experiment” is median produc- tivity of 100 barnstar awarded Wikipedians, while Why do people leave? “control” is matched Wikipedians that received Studies of why editors leave Wikipedia are much no awards from the experimentors. Third-party less performed than studies on why editors start awards are displayed with triangles. c Restivo and out. In 2009 Wikimedia Foundation staff and vol- van de Rijt. CC-BY. unteers developed an online survey and emailed it to 10’000 contributors. The included contributors Wikipedians in one of the groups (“experiment” should have had between 20–99 edits and no edits in Figure9) and none in the other group (“con- in the past three months. The survey conducted in trol”). The productivity decreased in both groups, the beginning of 2010 had a response rate on 12% but the productivity was sustained better in the from 1238 editors. Around half left due to per- group with awarded Wikipedians, see Figure9. 262 sonal reasons (e.g., other commitments), another Benjamin Mako Hill and Aaron Shaw would also half left due to “bad terms”. The researchers split see an effect of barnstars summarizing their find- the “bad terms” into reasons related to complexity ings with: “We show barnstars can encourage ed- (e.g., writing a encyclopedia is difficult) and com- itors to edit more and for longer periods of time, munity (e.g., other editors being rude). Among the but that this effect depends on the degree to which high scoring statements for leaving were “my work the recipient chooses to use the award as a way to kept being undone” and “several editors were too 280 show off to other editors”.276 In his classic work on stubborn and/or difficult to work with”. Eric FOSS development, The Cathedral and the Bazaar, Goldman speculates that Wikipedia has a partic- Eric Raymond also discusses motivation for volun- ular vulnerability life changes among its contrib- tary work and mentions the concept of “egoboo”, utors, because they tend to be young, unmarried 268 a shortform of ego-boosting. He argues that sci- and childless. The Wikimedia Foundation edi- ence fiction “has long explicitly recognized tor survey of April 2011 found ‘less time’ with 37% ‘egoboo’ (ego-boosting, or the enhancement of one’s as the most reported reason for becoming less active reputation among other fans) as the basic drive be- on Wikipedia, compared with, e.g., 7% for ‘conflict 213 hind volunteer activity.”230 The existence of pub- with other editors’. lic edit counter toolsxxx as well as pages such as Based on large-scale quantitative analysis of, e.g., Wikipedia:List of Wikipedians by number of ed- the editing pattern and its temporal evolution the its allow Wikipedians to ‘egoboost’ via their ‘edit researchers concluded that Wikipedia had a slow- count’. ing growth up to 2009. They explained this phe- The notion of addiction in Wikipedia contri- nomenon with “limited opportunities in making bution (colloquially ‘wikipediholic’, ‘wikiholic’ or novel contributions” (the low-hanging fruits are al- ‘wikipediholism’) has also been noted.235, 277, 278 ready taken) and “increased patterns of conflict New York-based Wikipedia researcher Dan Cosley and dominance due to the consequences of the in- 281 even claimed “It’s clearly like crack for some peo- creasingly limited opportunities”. An increas- ple”.277 Finnish researchers proposed gateway the- ing ‘xenophobia’, where seasoned Wikipedia editors ory, originating in the study of drug use, as a frame- under constant spam and vandalism threat develop a ‘revert first’ mentality and away from ‘assume xxxSee Wikipedia:WikiProject edit counters on the English good faith’, may make it harder for contribution Wikipedia. from unsophisticated users to stick.268 Perhaps this

30 aspect affects the retention of new contributors. Contributor may also leave because of project forking,282 examples being and the so- called Spanish Fork of Wikipedia.25

Edit patterns Several studies analyze and models the edit pat- terns of users and articles.278, 283, 284 One claim for the number of new edits to an ar- ticle within a time interval comes from Wilkinson and Huberman, who hypothesized that the number of new edits in the time interval “∆n(t) is on av- erage proportional to the total number of previous edits” with some amout of fluctuation283, 284

∆n(t) = [a + ξ(t)]n(t), (1) Figure 10: The so-called ‘oh shit’ graph: English Wikipedia: Retention Rate vs. Active Editors where a is a constant representing the average rate 2004-2009 with a decay number of active editors. of edits and ξ(t) is the fluctuation. Via the central From Wikimedia Commons by user Howief. limit theorem they arrive at the lognormal distri- bution for the number of edits Wikipedians have reported mathematical mod- 1  (log n − at)2  els of growth of the number of articles on the P [n(t)] = √ √ exp − , (2) n 2π s2t 2(s2t) English Wikipedia and made them available from Wikipedia:Modelling Wikipedia’s growth. Between where µ = at and σ2 = s2t, i.e., the mean and 2003 and 2007 an exponential model fitted the data variation of this lognormal distribution it linearly well.285 A logistic model estimated from data avail- related to the age t of the article. They show on a able in 2008 indicated that Wikipedia would reach robot-cleaned dataset from the English Wikipedia a maximum around 3 to 3.5 million articles, a re- that the distribution of the number of edits follows sult reported by Suh et al. in 2009.281 As the the model well except for a few events, e.g., one is logistic model does not model the data well at the edits of US town articles. It is fairly surpric- least after 2010 other models have been suggested ing that the number of edits can be modeled with and a seasonal variable have been included in the such a simple model that only contains Wikipedia- model. A model involving the Gompertz function, ct intrinsic variables. No external forcing variables, y(t) = aebe , was reported to project a maximum e.g., number of editors, ‘importance’ or ‘notability’ of about 4.4 million article, — a number which of article topic, appear in the equation. Note that the English Wikipedia has surpassed. Suh et al. the data set only covers the period where Wikipedia suggested a hypothetical Lotka-Volterra population had exponential growth and making it how well the growth where the growth was not bounded by a model fits newer data. constant but limited by a linear function of time The overall temporal evolution of the number arguing “there is a general sense that the stock of of articles and editors are perhaps the most im- knowledge in the world is also growing.”281 port statistics of Wikipedia. It is regularly com- puted by Wikimedia Foundation data analyst Erik Why does it work? Zachte and reported on the Wikimedia Statistics homepage http://stats.wikimedia.org/. Of con- Why does Wikipedia not succumb to vandalism and stant worry for Wikimedia is the statistics of the opinionated content? There are other open fora on number of active editors and the editor retention. the Internet: The Usenet and emails. The Usenet The decaying number of active editors on the En- is like Wikipedia a public space, but from a start glish Wikipedia has been apparent for a number of among ‘geeks’ it has been used for and years and given rise to the so-called ‘oh shit’ graph, pirated content.286 Another collaborative knowl- see Figure 10, which shows the number of active edge building project, Open Directory Project, also editors to peak in 2007 and then decaying. The relied on free labor, but was in 2006 declared to be number has continued to decrease and was in the ‘effectively worthless’ by Eric Goldman.183 beginning of 2014 around 30’000 according to the In 2011 Benjamin Mako Hill presented his re- Erik Zachte data analysis. search about Wikipedia and other crowd-sourced

31 online encyclopedia, Interpedia, The Distributed Encarta. Andrew Lih noted “it was not timely, Encyclopedia, Everything 2, h2g2, Aaron Swatch’s open, social, or free.” and a “half-hearted imple- The Info Network, and GNUpedia, mentation”.25 sought to answer what distinguished the success- I have not found mentioned that Wikipedia is a ful Wikipedia from the failed or less successful excellent example of . Much quality text projects. Hill noted that Wikipedia offered low on the Web is not linked very much, e.g., a typi- transaction cost in participation and initial focus cal New York Times news article has typically no on substantive content rather than .287 links in the body text. Web scientific articles rarely Two of the earliest comment on the Wikipedia phe- has links. Google had typically no hypertext nomenon pointed to the low transaction cost as a links besides table of content-type or external ref- reason for the Wikipedia success.264, 288 Philippe erences. Even MediaWiki-based has Aigrain noted low transaction costs as an essential few intrawiki links. in Danish encyclopedia factor for success, — not only for Wikipedia but for Den Store Danske and Google Knol are/were not other “open information communities”, e.g., Slash- well-predictable. Links on the ordinary Web based dot.288 Aigrain would see transaction costs as hav- on non-wiki technology do not display the avail- ing a number of aspects apart from (possible) mon- ability of the linked page. The URLs of Wikipedia etary costs: cognitive, information, privacy, uncer- articles are well-predictable, likely reducing naviga- tainty and locking-in costs, with, e.g., the cost of tional costs. “navigating the transaction management layers” as The creation of Wikipedia is a decentral pro- a information cost.288 Aigrain also argued for two cess without economic incentives, where individ- other factors contributing to the Wikipedia success: ual contributions get aggregated to a collective re- a clear vision and mechanisms to fight hostile con- sult. Certain Web-games have the same charac- tributions. He sees the statement on neutrality of ter: In the Web-based market game Hollywood point of view as a clear vision and the revision con- Stock Exchange the goal is to predict the box of- trol system as the key mechanism to counter hos- fice of a movie by buying and selling virtual shares. tile or noisy contributions,288 and indeed it seems Four weeks after the opening the Hollywood shares very difficult to imaging how Wikipedia could work get the toy money in accordance with the earn- without its revision control system where contribu- ings of the movie. The price of the stock then tors can restore ‘blanked’ articles. There are likely becomes a prediction on the real earnings of the a large number of important technical elements movie. Artificial market games perform surpris- in the MediaWiki software that lower transaction ingly well in predictions, e.g., Hollywood Stock Ex- costs for ‘good’ contributors and heightened it for change could predict Oscar winners better than in- ‘bad’: CAPTCHAs, ‘autoconfirmed’ user level, user dividual experts.292, 293 Apparently the prediction blocking, IP range blocking, blacklists, new pages markets attracts well-informed and well-motivated patrolling and watchlists. The flagged revisions of players whose mutual trade creates information, the German Wikipedia might be the reason for its even though the participants does not gain any high quality. The ability of Wikipedians to com- money. There is not real trade in Wikipedia, but municate beyond the stigmergic article space col- Wikipedia forms a decentral knowledge aggrega- laboration, e.g., via discussion pages, is presumably tion, like the knowledge markets. important, — at least they are almost always used The ‘power’ of the collective has often been in- for high quality ‘featured articles’.41 voked and sometimes under the name Linus’ Law, As a newer development of a Wikipedia chal- that Eric Raymond coined in The Cathedral and lenger, Google launched the Knol service in 2008, the Bazaar to characterize an element in FOSS de- and while contributors could monitize their con- velopment: “Given a large enough beta-tester and tent Google announced the closing of the service in co-developer base, almost every problem will be 2011. Although a quantitative study could find dif- characterized quickly and the fix obvious to some- ferencies in the content of articles of Google Knol one,”230 i.e., openness gives robust results. The and Wikipedia, e.g., with respect to the number of concept has also been applied on Wikipedia to references,289, 290 it seems unclear if we can blame explain its success.294 Greenstein translates the such differences for Google Knol’s demise. One ‘law’ to the Wikipedia context with: “Enough at- commentor noted that “Google didn’t seem to allo- tention to such details [facts and commonly ac- cate many resources to the project, and it certainly cepted knowledge presented on Wikipedia] elicits didn’t put much emphasis behind it from a press the relevant objective information and many view- standpoint”.291 Before Knol and with likely in- ers then correct errors as needed.” and “[g]iven spiration from Wikipedia Microsoft unsuccessfully enough cheap storage, all viewpoints can be repre- opened for user contribution to their encyclopedia sented.”294 He also notes its difficulties with costly

32 verification and subjective information where Li- Thesaurus construction nus’ Law might not work. The comprehensive Rouget’s International The- saurus307 lists both the word agency as well as Fed- 3 Using Wikipedia eral Bureau of Investigation. None of these two ar- ticles link to the other directly. Several groups at- Users meet not only Wikipedia as readers of the tempt to make systems that will automatically de- Wikipedia site. Other web-sites utilize Wikipedia termine the relatedness of such words and phrases content to enhance their own site, and the rich based on mining data in Wikipedia or other sources, e.g., WordNet: “synonym search” or “semantic re- structured annotations with links, tags and tem- 140, 298, 303, 308–310 plates allow researchers and developers to use latedness”. Wikipedia as a . Especially re- Chesley et al. used Wiktionary to expand a la- searchers in information retrieval and automatic beled list of adjectives.296 For text sentiment anal- benefit from Wikipedia. In this ysis they needed the polarity and objectivity of kind of research WordNet has stood as the foremost adjectives. They already had a list with a lim- open computer-based lexical database for the En- ited number of words manually labeled in this re- glish Language. Some research compares or com- spect. By lookup in Wiktionary they could as- bines WordNet and Wikipedia. The review Min- sign polarity and objectivity to yet unlabeled words ing meaning from Wikipedia from 2009 gives an by examining the definition of the adjective. The overview of studies using Wikipedia data in natu- approach also handled words with negated defini- ral languauge processing, information retrieval and tion.296 Zesch and Gurevych used the Wikipedia extraction as well as ontology building.9 article and category graph in their semantic relat- Other wikis besides Wikipedia provide data for edness method.140 this kind of research. Wiktionary, the dictionary- like sister project, supplements Wikipedia, and several studies have used information from Wik- tionary.295–298 Named entity recognition

One of the basic tasks of , Information retrieval named entity recognition (NER), deals with iden- tifying named entities such as personal names, A program uses Wikipedia articles as references to names of organizations or genes from freeform detect so-called “content holes”. Content holes are text. NER often relies on a machine learning al- neglected or missed subtopics in a text. The re- gorithm and an annotated dictionary (gazetteer). searchers regarded user generated content from dis- Serveral researchers have used Wikipedia for NER cussion threads and compared that to the sections or more general keyword/keyphrase/“concept” ex- found in Wikipedia articles by extracted keywords traction.304, 310–315 such as proper nouns and numbers.299–301 Web search queries occuring with high frequency French Exalead’s Wikifier is a Web service that often directly relates to Wikipedia pages. Within will take a document (either as a URL or a text) a large sample of web queries 38% matched exactly and serve it back as an HTML page and link terms a title of an article in Wikipedia.302 The content within the document to relevant Wikipedia articles. and context of the matched Wikipedia page can With named entity recognition it detects people, then be used to automatically expand the query, places or organizations. CC-BY-NC-SA-licensed e.g., Wikipedia pages can form an intermediate be- Accurate Online Disambiguation of Named Entities tween a user query and a collection of books being (AIDA) software for online named entity extrac- searched.302 Researchers have used the redirects tion with disambiguation uses Wikipedia indirectly 315 in Wikipedia to form a thesaurus for use in docu- through YAGO2. A online demo version is avail- ment retrieval systems with automated query ex- able. pansion.303, 304 A game development system, that starts with a Researchers also use images from Wikipedia in newspaper article, identifies country and notable information retrieval systems research.305, 306 Re- person based on information in Wikipedia. The searchers used part of the for the system furthermore gauges the sentiment towards demonstration of a combined text and image brows- the person by using posts from Twitter in sentiment ing system with similiarity computation.305 analysis.316

33 Translation Ontology construction and categories Social websites may allow users to put free-form Wikipedia can act as a resource in machine trans- tags on content. The relationship between the lation and multilinguage systems by using the lan- combined set of tags has been term .320 guage link that connects different language versions Users tag Wikipedia articles with categories and of Wikipedia. Before the Wikidata project inter- the users can organize the categories in a hierar- language links was part of the article page of the chy (actually a directed acyclic graph, usually, but MediaWiki site through a special markup that usu- not necessarily a hierarchy). While editors may ally appeared at the bottom of the page. Daniel use free-form categories, the categories that sur- Kinzler’s WikiWord system extracts lexical and se- vive are the ones which have support in consen- mantic information from Wikipedia to form a mul- sus. Jakob Voß distinguishes between collaborative tilingual thesaurus.306, 317 In an extension to the tagging (e.g., used the del.icios.us website), clas- system called WikiPics he use it in combination sification (e.g., Dewey Decimal Classification) and with images from Wikimedia Commons for multi- thesaurus indexing (e.g, Medical Subject Headings, lingual image retrieval. MeSH, where records/categories may have multi- It seems obvious that the multilingual Wikidata ple broader terms), and he regards the Wikipedia will become a valuable and easier accessible re- category system as a collaborative thasaurus.321 source for future translation studies relying of Wiki- The categories together, with categorized articles, media data. Researchers will no longer need to have been used for in a semi- crawl multiple Wikipedias to establish naming vari- automated system.322 ations (through redirects) and extract interwiki lan- Another way of using Wikipedia begins with an guage links. Instead all basic language links is read- already established knowledgebase and then at- ily available in Wikidata in machine readable for- tempts to match and extract data from Wikipedia mat. Arun Ganesh (‘Planemad’), one of first to for further population of the knowledgebase. explore Wikidata-based translation, built a proto- Friedlin and McDonald would start with records type with a map of India with labels for the states in the medical terminology database LOINC and of India retrieved from Wikidata: The Wiki At- with a semi-automated program attempt to find las. It would allow the user to switch between, corresponding articles in Wikipedia, so the intro- e.g., Hindi, English and Zhongwen representations ductory Wikipedia text could be used for a helping xxxi of state names. In 2014 Denny Vrandeˇci´cand description text for the LOINC records.86 Reagle Google would take this approach a step further by and Rhue have described an application of a per- publishing the qLabel Javascript library, enabling sonal name matching method between multiple bi- a website developer an easy method to provide ographical works—including Wikipedia—as well as xxxii Wikidata-based multilingual website content. gender determination method in a study of gender The Danish company GrammarSoft provide bias.88 online translation of Wikipedias from one lan- Two Technical University of Denmark students, guage to another. It has translated the Swedish Magn´usSigurdsson and Søren Christian Halling, Wikipedia to Danish and made it available from used the part of Wikipedia dealing with musi- http://dan.wikitrans.net/ complete with hyper- cal groups for the Internet-based search engine, links, while an Esperanto-English version is avail- MuZeeker and could group search results accord- able from http://wikitrans.net/. These system uses ing to Wikipedia categories.323 A further related Constraint Grammer extended to handle the ‘heav- application ran on mobil telephones.xxxiii ily layoutted’ text of Wikipedia.318, 319 Studies on Wikipedia also uses the Wikipedia Even though one language version of a Wikipedia category network as a form a ground truth to assign 94, 141 articles may be a translation of a Wikipedia arti- top categories to Wikipedia articles. Altough cle in another language, the wikitext is not readily the category network is directed the researcher con- sentence-aligned. Thereby a potentially valuable sider the undirected network to avoid disconnected resource for training statistical categories, though weighting the graph according 141 systems is lost. Parts of the documentation for Me- to right/wrong direction help categorization. diaWiki and some other Wikimedia resources are sentence aligned. Databasing the structured content Several groups have extracted information from the xxxihttp://4thmain.github.io/projects/hacks/wiki- categories and templates of Wikipedia and built atlas.html. xxxiihttp://googleknowledge.github.io/qlabel/. xxxiiiMuZeeker

34 databases. The system extracted data from service puts emphasis on trending topics that have Wikipedia and combined it with WordNet.39, 324 not before been in the . One example is At- Their aim is ontology building with facts such as ifete Jahjaga rising from relative obscurity to Pres- “Elvis” “is a” “rock singer” and “Elvis” “has the ident of . Thomas Steiner and his cowork- nationality” “American”. They note that the semi- ers present yet another web service for realtime structured data of Wikipedia is not directly useful, trend spotting: Wikipedia Live Monitor monitors e.g., the category tree has “Grammy Award” as Wikipedia edits across 42 different language version a supercategory of “Elvis” as the rock singer had and detect concurrent edit spikes to report break- won that award, but it would be wrong to infer ing news, e.g., the 2013 Russian meteor event.328 “Elvis” “is a” “Grammy Award”. This is in con- Steiner has presented a related web service, the trast to WordNet which has a cleaner graph. When Wikipedia Natural Disaster Monitor, focusing on reported in 2008 YAGO had 1.7 million entities and the natural disaster articles on Wikipedia for crisis 15 million facts.39 The number of entities may be response.329 compared to the number of terms in Gene Ontology S´ergioNunes’ WikiChanges creates an on-the- that I in 2013 counted to around 40’000. fly graph of the edit activity through time of one The largest effort is probably DBpedia2 with ex- or two Wikipedia articles. One example pointed traction of Wikipedia templates and the informa- to compares ‘’ and ‘Hillary Rod- tion made available at http://dbpedia.org/ BBC ham Clinton’ articles and shows most edits for the uses among others DBpedia for linking documents winner of the United States primary election. In- across their web site.325 spired by Maximillian Laumeister’s Listen to Bi- With the advent of Wikidata, direct extrac- coin, Stephen LaPorte and Mahmoud Hashemi tion of data from the categories and infoboxes of built the Web service which Wikipedia may still be relevant, but it seems that converted Wikipedia recent change feed to an online Wikidata will be a much stronger base. dynamic visualization with sound. Circles would appear and bells would sound each time an edit was made. Trend spotting and prediction The Wikimedia Foundation did for some time not Just prior to the announcement of the choice of make statistics on page views available. However, the United States vice presidential candidant Sarah in December 2007 board member Domas Mituzas Palin’s Wikipedia article saw a high editing activ- announced the availability of hourly statistics for ity.326 The activity was higher than for other can- each page from http://dammit.lt/wikistats/. Such didates, thus giving an indication for the choice of data makes it easy to spot trends by simple count- Palin. In another case Paula Broadwell’s Wikipedia ing. The user “Henrik” has presented a useful in- page contained the short statement “Petraeus is re- teractive Web service that renders the statistics portedly one of her many conquests” for several in monthly histograms from http://stats.grok.se/. months before the secret relationship became pub- Showing topics of public interests — and disinter- lic knowledge and led to an FBI investigation and est — the user “Melancholie” pointed to an exam- the resignation of the CIA director.327 Cyber in- ple: “Hurricane Gustav” and “Bihar flood”, both telligence systems can use this kind of information natural disasters of 2008 but one in rich USA, the for predicting events prior to it becoming public other in an impoverished state of India. The for- knowledge. mer reached several hundred thousand page views Public Web tools monitor the editing pattern and in August and September while the latter only a presents analyzed results based on queries: Craig few thousands. Wood’s Wikirage Web-site ranks Wikipedia arti- Based on the viewing statistics Ed Summers cles based on edit activity measured not only as made the Web service Wikitrends available in 2012 the total number of edits, but also, e.g., the num- from http://inkdroid.org/wikitrends/ showing the ber of undos and unique authors. This tool is use- top 25 most viewed Wikipedia articles for the past ful to identify edit conflicts which may generates two hours. Since 2008 Johan Gunnarsson has run lots of edits. WikiChecker also generates statis- a similar Web service, also called Wikitrends, first tics over users and individual articles as well as list from the Toolserver and later on on the Wikimedia highly edited pages. A similar Web-site by Den- Tool Labs servers with up- and down-trends based nis Yurichevxxxiv shows articles created within the on day, week, month or year and across many lan- last month, last week and last 24 hours ranked by guage versions of Wikipedia. Furthermore, Andrew the number of contributing editors. Such a Web West makes weekly reports of the 5000 most visited pages on the English Wikipedia available on one of xxxivhttp://unit1.conus.info:8080/en.wikipedia.stats/ his user subpages on Wikipedia. Weekly annotated

35 versions of the Top 25 articles from the West list are made available in the Wikipedia:TOP25 arti- cle. West together with user Milowent have also reported on page view spikes, see page 26. Working from stats.grok.se June and January 2008 statistics and examing health-related top- ics with probable seasonal effect—such as frost- bite, hypothermia, hyperthermia and sunburn— Laurent and Vickers found a clear effect in the page views:242 On average the ratio was around 3 be- tween June and January page views for these top- ics. They also analyzed the page view statistics of three articles describing melamin, salmonella and ricin. These examples were associated with official health alerts in 2008, and page view statistics also shows a marked increase correlating with the an- Figure 11: Growth of the number of structured nouncements. In 2014 a company would publish an journal citations. Plotted with data obtained for analysis of English Wikipedia articles with health citation clustering.81 care information reporting correlation between ar- ticle views and related prescriptions and unit sales of medication.330 Another interesting application https://twitter.com/palnatoke/lists/wikiedit with would examine how well Wikipedia activity could the number of list members reaching 48 in Octo- be used to predict movie box office success.331 Not ber 2014. only the viewing statistics was used as a feature in the prediction, but also the number of edits, the Altmetrics number of users editing and “collaborative rigor”. Keegan did a related small longitudinal study with Altmetrics is short for alternative metrics and is prediction of 2014 Academy Awards winners based the study and use of measures for scholarly impact on the number of edits and the number of unique based on activity in social media. Mendeley and Ci- editors in 2014. He predicted 3 out or 5 correctly.332 teULike are useful services, but also Twitter, blogs Yet another analysis of the utility of Wikipedia and Wikipedia have been examined as sources for page view statistics for prediction of real-life events altmetrics,335 and there have been a few studies on showed mixed results for the result of political elec- altmetrics with Wikipedia. In a survey among 71 tions in Germany, Iran and .333 bibliometrics researchers 33.8% stated that “men- Like Wikipedia, the Twitter Web service pro- tions of or links to your work in Wikipedia” had the vides an API for returning content in structured “potential for article or author evaluation”. For the format.334 The Wikipedia services for trend spot- statement “article about you on Wikipedia” 26.8% ting may resemble some of the many Web ser- confirmed its potential.335 vices for Twitter postings, and programmers have My analysis of outbound links to scientific jour- presented systems combining Wikipedia and Twit- nals from Wikipedia (using the cite journal tem- ter. Tom Scott implemented a Twitter bot with plate) found that the correlation with the de facto Wikiscanner-like functionality: A bot monitor- standard for scholarly impact measure was reason- ing Wikipedia changes would track Wikipedia ed- able.80, 81 Bot-generated content can change the its from United Kingdom Parliament IP-adresses number of citations considerable. Whereas typical and output changes to a specific Twitter account citations uses the ref tag for in-text citations, the (@parliamentedits). Inspired from this system Ed Protein Box Bot would add citations to the end of Summers implemented a system for the United the article without using the ref tag. When the States Parliament (@congressedits) and released bot created and amended thousands of articles in his code,xxxv enabling the monitoring of several 2008 there was a large increase in structured cita- other parliaments, e.g., with @Riksdag- tions using the cite journal template, but with- WikiEdit, Denmark with @FTingetWikiEdit and out using the ref tag, see Figure 11. Indeed the with @gccaedits. Danish Wikipedian Ole results indicate that the bot in 2008 was the creator Palnatoke Andersen would collect these Wikipedia of the majority of the journal citations in the En- monitoring Twitter accounts on his list at glish Wikipedia. American researcher carried out a similar analysis with the 2010 English Wikipedia xxxvhttps://github.com/edsu/anon dump, but instead based the citation extraction on

36 the PubMed identifier (PMID) or the Digital Ob- Wikipedia landmarks. ject Identifier (DOI). By analyzing the full revision history of the English Wikipedia they identified a trend for journal articles to be cited faster in later Other uses years compared to earlier years.336 WikiTrivia developed by Alex Ksikes and Philipp The verbose journal clustering results from my Lenssen uses the introductory paragraph from journal citation data mining is available from Wikipedia for an online quiz. With the title words http://neuro.compute.dtu.dk/services/wikipedia/- blacked out from the paragraph the task is to guess citejournalminer.html. On the English Wikipedia the title from the paragraph. The quiz participant itself the WikiProject Academic Journals keeps chooses a category, such as politics or animals, be- bot-updated detailed statistics of journal citations fore the game begins. The online quiz WhoKnows? with the overview page at Wikipedia:WikiProject also uses information from Wikipedia, but indi- Academic Journals/Journals cited by Wikipedia. rectly through DBpedia.337 Analyzing the matrix formed by intrawiki Researchers have used the edit histories of Wikipedia linking with the PageRank and Wikipedia articles in a study of the performance CheiRank methods Toulouse and Hungarian re- of system for sharing mutable data.338 searchers would examine the ranking of the top 100 universities over time and compare it to university ranking. They found the PageRank of the top 10 university to increase over time, and, 4 Extending Wikipedia e.g., Harvard and Columbia at the top.148 As researchers themselves can edit pages and The original Web protocol of Tim Berners-Lee has insert mentions or links to their own work (as I always had the ability to both read and write have done) Wikipedia citations are susceptible to on the Internet. However, in the first decade of manipulation of the altmetrics measure. Notabil- the Web the writing cabability was little used. ity criteria, deletionists and general collaboration Some Web-based applications enabled collabora- among contributors may keep a bound on this prob- tive workspaces, e.g., the BSCW system, that let 339 lem. Also on the downside of using Wikipedia users upload files. Another effort, WebDAV, for altmetrics is the poor coverage of scientists on would standardize a protocol for distributed au- 340 Wikipedia.89 thoring on the Web. Wikis begin with the work The Altmetrics.com website features Wikipedia of , with many of the early tech- 29 as one of altmetrics sources for paper-level scien- nical ideas described in the book The Wiki Way: tometrics. It tracks Wikipedia citations to journal The ability of users to edit and create new pages, articles by date and user.xxxvi the easy way to make page link between pages, the free structure of wikis, the notion of a sandbox, page locking, camel-case (wikiwords), the basic edit Geotagged content conventions (the wiki markup) and revision control. Some Wikipedia articles have markup with geo- graphical coordinates. These coordinates can be Researchers and developers have suggested many extracted and used, e.g., with rendered maps as, technical extensions for MediaWiki, the Wikipedia e.g., in Google Earth and Danish Findvej.dk. Wiki- software: Software tools that automatically create data with its structured content including geotag- new content or modify existing, e.g., to fight van- ging can provide a plentora of information for en- dalism, tools that help users (readers and editors), riching maps, e.g., OpenStreetMap user ubahnver- e.g., get an overview or provide the content in an- leih combined Wikidata’s entities with geotags and other form than the usual way, and extensions with coat of arms images to render coat of arms on map, additions to the data representation and interfaces see http://osm.lyrk.de/wappen/. to handle structured content. The augmented reality platform Wikitude and Wikipedia editing by bots of assisted editing its world browser uses location-based tools has increased over time. When Stuart Geiger Wikipedia content such that Wikipedia article ex- examined the fraction of bot and assisted editing in cerpts are overlaid on the camera image. The 2009 he found 16.33% of all edits performed by a Wikipedia article shown is selected based on GPS bot and assisted editing programs used in around and compass information in the mobile phone. An- 12.16% of all edits. On a particular page used for other augmented reality browser, Layar, embeds vandal fighting, Wikipedia: Administrator inter- vention against vandalism (AIV), edits by tools are xxxviSee, e.g., 422759 on www.altmetrics.com. predominant with around 75% of the edits.181, 210

37 First class Other class Features Performance Ref. Featured Clean-up Computed trust 82%/84% 341 Featured Random Word count 96% 342 — — 30 features 98% 342 Featured Nominated, but rejected 12 features — 343

Table 8: Automated article classification. The performance column may indicate different types of performance, e.g., accuracy.

Automated quality tools user, regarding administrators as the most trust- worthy and then in decreasing order: registered There have been several suggestions and im- users, anonymous users and blocked users. The re- plementations of tools for quality assurance in searchers in this particular study restricted them- wikis and Wikipedia particularly, and number of selves to 50 featured articles, 50 “clean-up” articles tools are in operation. The tools that oper- and 768 normal articles in geography for the com- ate automatically on Wikipedia are usually ref- putation of trust. It is unclear whether the method ered to as bots and may perform mondane tasks, using BUGS would scale to full Wikipedia applica- such as between-language Wikipedia version links tion. Besides computing trust the researchers also (between-language Wikipedia links are now taken used the trust values to predict the article class. On over by Wikidata) before the advent of Wikidata, a test set of 200 new articles with 48805 revisions add dates to templates, include titles from cited they could report predicting 82% featured and 84% 344 external pages, etc. The most important bots clean-up article correctly. are probably the anti-vandalism tools (vandal fighter bots) on Wikipedia. These kinds of bots An anti-vandalism approach from 2008 used bag- have been operating since user Tawker’s - of-words and a na¨ıve Bayes classifier as well as ward began in 2006.345 This one and the ini- ‘probabilistic sequency modeling’, though could not 178 tial ClueBot operated with simple text pattern report better than ClueBot results. Another ap- matching.176 Christopher Breneman’s and Cobi proach used features such as ‘edits per user’ and 177 Carter’s ClueBot NG, now operating on the En- ‘size ratio’ and obtained quite good results. The glish Wikipedia, leaves a major impact by rejecting UC Santa Cruz WikiTrust, presented at the Web at 40,000 vandalisms a month (mid-2012 number).345 .soe.ucsc.edu, has an associated API avail- It monitors the IRC stream of changes and uses able, which may report the author of a word, its a trained machine learning algorithm (an artifi- “trust” and the revision where the word was in- cial neural network) to classify and it will then re- serted. This API is, e.g., used by STiki. WikiTrust vert the edit if it is sufficiently probable that the extracts a number of features and uses a classifier 350 edit is a vandalism. Training set data for ClueBot from the Weka toolkit. The systems enables in- NG’s machine learning is obtained through a web- dividual words in Wikipedia articles to be colored 349 based interface at review.cluebot.cluenet.org where according to computed ’trust’. Wikipedians can label data. In contrast to the complex algorithm approaches, Besides the ClueBot NG, other anti-vandalism it has been found that the word count of an article and article quality prediction systems using ma- performs surprisingly well as a predictor for article chine learning or other complex algorithms have quality.342, 352 Blumenstock suggested to make a been suggested.178, 205, 341–343, 346–351 Some of these cut at 2’000 words between featured and random systems do not necessarily focus on vandalism de- articles and found a error rate on around 96% on tection, but on more general quality aspects, e.g., a corpus of 1’554 featured and 9’513 randomly se- classifying “featured” articles from other kinds lected articles.342 The researcher of this study also of articles, see Table8. Reported in 2006 one tried a number of other more complex features, e.g., system used dynamic Bayesian networks and the readability indices, citation count, category count BUGS software to compute the “trust” of English and sentence count, and a range of machine learning Wikipedia articles based the trust of its previous algorithms, increasing the accuracy to 98%. When revision, the user role and the amount of insertion featured articles are compared to other selections and deletions with these variables linked up in a of articles rather than random ones then quality- Markov chain.341 The researchers used beta dis- associated features may change. Using an analysis tributions setting priors based on the status of the of the 2007 Tech Massacre article for hy-

38 pothesis generation Gerald Kane set out to exam- Wikipedia API. With feature extraction and ma- ine which features predicted whether a nominated chine learning applying support vector regression article would become a featured article or would running on a server, edits would be scored for be rejected as such. Using 12 different features vandalism and high-scoring edits would be pre- from 188 Wikipedia articles and binary logistic re- sented to a client where a Wikipedian could indicate gression analysis he found that reference count and whether the edit was a vandalism or not.179, 180 The depth of top contributor experience signaled quality Wikipedians using the system would need to down- while higher word count and breadth of top con- load the Java client program. The newest version tributor experience as well as, e.g., percentage of of STiki uses information from ClueBot NG and anonymous contributors would make it more likely WikiTrust. that the nominated article would be rejected as fea- An different approach, Wikibu, seeks to indicate tured article.343 the reliability of a Wikipedia article by embedding By considering good-quality “tokens” as those the Wikipedia articles on a page that also shows edits inserted by a user that are present after the the number of visitors, number of editors, number intervention of an admin Javanmardi, Lopes and of links and sources of the Wikipedia article. A Baldi reported in 2010 about the performance of version for the German Wikipedia is available from models for user reputations.205 They examined http://www.wikibu.ch. four different user roles: Admins, “vandals”, “good Other efforts focus on approval of article through users” and blocked users. Looking at the entire peer review. There are already mechanism for peer revision history of the English Wikipedia and uti- review on Wikipedia itself (e.g., the notion of ‘fea- lizing a diff algorithm and MD5 checksums they tured article’), but also combinations with external were able to classify between admins and vandals systems have been suggeted.358 One example was with an area under the ROC curve value of around Veropedia, that had over 5000 checked Wikipedia- 97.5%. In one of their models the speed of the dele- based articles. In July 2010 the site was unavail- tion affected the reputation of the user negatively able. and in their most elaborate model the reputation of Back in 2006 Bertrand Meyer called for a certi- the user that deleted the content also affected the fication of an entry and a refereeing process and reputation of the user inserting the content. believed that a rating system for authors and en- In 2010 a Wikipedia vandalism classification tries is inevitable.167 competition was held, where 9 systems competed. The competition was based on a data set with Automatic creation of content 32’452 edits on 28’468 different English Wikipedia articles annotated with crowd-sourcing through Andrew Lih has called the start of the mass- ’s Mechanical Turk.355 A plentora of dif- creation of content via bots “the most controver- ferent features were suggested,354 see also Figure9 sial move in the history of Wikipedia”:25 In the for a few. The system by Mola Velasco won with autumn of 2002 Derek Ramsey began to add ar- WikiTrust coming in 2nd or 3rd (depending on ticles on US counties to the English Wikipedia the measure used for evaluation).354 Mola Velasco based on information extracted from public United used a broad range of features and for the classi- States Census data. The automatically constructed fication he used algorithms from the Weka frame- text, corresponding to a couple of thousands of work.351 However, the organizers of the competi- articles, he manually added to Wikipedia. Af- tion could report (with some measures) better per- ter this addition he started with automated cre- formance than any of the individual classifiers when ation of articles for 33’832 US cities completing the classifiers were combined into an ensemble clas- the task in less than a week in October 2002. As sifier.354 The competition the following year would the English Wikipedia had just over 50’000 arti- also include German and ed- cles before Ramsey began, he quickly became re- its.356 The winning entry used 65 features. Among sponsible for the creation of 40% of the articles the presently best performing algorithms are ap- on the English Wikipedia.25 Longitudinal stud- parently a system using 66 features and Lasso for ies show the effect of Ramsey’s work.283, 284 Be- estimation.357 fore Rambot’s activity a few other bots operated on To avoid that good edits automatically get re- Wikipedia creating content on a more limited scale: verted (false positives) the bots usually operate The ‘Wikipedia:History of ’ English with a high threshold leaving false negatives. This Wikipedia page lists, e.g., import of a glossary of problem has left room for the semi-automated van- telecommunication terms in February 2002 from dal fighter tool of Andrew West called STiki. The Federal Standard 1037C. Bots can still make a ma- original system hooked on to the IRC and the jor impact on the size of a Wikipedia: Sverker Jo-

39 Edit feature Description/assumption Edit time of day 180 Edit day-of-week 180 Registration time Time since registration of editor180 Last edit time Time since last edit of article180 Vandalized time Time since editor last vandalized180 Article reputation 180 Editor reputation 180 Categorical reputation 180 Geographical reputation 180 Comment length Length of the revision comment180 Article features Description/assumption Number of edits/rigor More edits are better.283, 284 “[. . . ] more editing cycles on an article provides for a deeper treatment of the subject or more scrutiny of the content.”111 Number of editors/diversity More edits are better.283, 284 “With more editors, there are more voices and different points of view for a given subject”111 Word count The length of an article. Long articles are better.342, 352 Semantic stability Articles with semantically stable revisions are better.353

Table 9: Quality features of an edit or article. See also the many more features used in the Wikipedia vandalism detection competition.354

hansson’s bot operating on Swedish, Waray-Waray version of infoboxes in Wikipedia articles related to and Cebuano Wikipedias has constructed 2.7 mil- the project. lion articlesxxxvii as of February 2014. The large Tools may automatically create on-the-fly text number of articles for species has made the Swedish from the structured information in Wikidata. Mag- Wikipedia one of the largest Wikipedias.131 nus Manske’s Reasonator Web servicexxxviii shows Scientists have also automatically added infor- one of the first examples, see Figure 12 for the mation to Wikipedia by formatting data in al- generated page of Johann Sebastian Bach. A re- ready existing scientific databases for inclusion in lated on-the-fly construction of Wikidata seman- Wikipedia. In the project, Gene tic information is provided by Metaphacts with Wiki, the Protein Box Bot has since August 2007 https://wikidata.metaphacts.com. added several thousand articles on genes,240, 359 Stuart Geiger has made a longitudinal overview with automated construction of an , free- of overall bot and assisted editing activity on the text summary and relevant publication aggregated English Wikipedia for an around two month pe- from Entrez Gene and a gene atlas database.360 riod in the beginning of 2009. Assisted editing In a similar project, the WikiProject RNA started amounted to approximately 12% of all edits.181, 210 out with the Rfam database containing informa- Most of these edits seem not to be related to cre- tion about RNA and added the information to ation of content but rather vandal fighting, but Wikipedia, creating over 600 new articles. After bots can have a major impact on some areas of the addition of content to Wikipedia the informa- Wikipedia. With longitudial examinination of the tion flow was turned around and the (possibly ex- number of scientific citations in Wikipedia I found tended) content of Wikipedia was used for the an- a big jump from 2007 to 2008 due to content with notation of the Rfam database by daily monitoring citations added by the Protein Box Bot.81 Wikipedia for changes.361 Since November 2008 the The WikiOpener MediaWiki extension creates Rfambot has been working with updates and con- a framework for querying external databases such that their material can be merged in the wiki.362 xxxviiWikipedia Statistics. Bot article creations only: In May 2014 Magnus Manske introduced gamifi- http://stats.wikimedia.org/EN/BotActivityMatrix- Creates.htm xxxviiihttp://tools.wmflabs.org/reasonator/

40 Automatic links to external Web pages can also be suggested. Kaptein, Sedyukov and Kamps re- ported that 45% of all Wikipedia articles have an “External link” section.368 They built a system that would predict these external links. By us- ing the ClueWeb category B data consisting of 50 million English Web pages, anchor text in- dex, document priors (length, anchor text length and URL class priors), document smoothing and Krovetz stemmer they could reach 0.68 in perfor- mance measured with the so-called Mean Recip- rocal Rank (MRR). By furthermore using the so- cial bookmarking site they could improve MRR to 0.71. A more elaborate wiki content linking has been demonstrated with the IntelliGenWiki system, which incorporates natural language processing for extraction of, e.g., enzymes and organisms terms Figure 12: Magnus Manske’s Reasonator Web ser- from wiki text. The extracted data is added to vice with on-the-fly generation from data in Wiki- a Semantic Web system allowing the wiki so show data. The text in the grey field is template text embedded results of semantic queries with a Seman- 369 filled with data from Wikidata. tic MediaWiki extension. Their system has also been used to generate an index from wiki content370

cation of Wikidata entry, letting Wikidata editors set values of properties (e.g., the gender of a hu- Work suggestion man or whether an item ‘is a’ human) based on the introductory paragraphs of Wikipedia pages.xxxix A standard installation of the MediaWiki soft- The interface would present buttons for the user to ware can generate lists for work suggestions. press and the Web service would perform the ac- MediaWiki users find these links on the page tual Wikidata editing behind the scene.363 Google ‘Special:SpecialPages’ and examples are ‘dead-end developers constructed another semi-automated pages’, ‘wanted categories’ and ‘wanted pages’. Wikidata entry tool, Primary Sources Tool, ini- The Wikidata-based service Wiki ShootMe! lists tially used to move data from Google’s geographical items with missing images on Wiki- to Wikidata. Via a gadget the tool would suggest data based on a query coordinate given by the user. new claims by modifying the Wikidata edit inter- A Wikipedian can use it to identify photo opportu- face and giving the user a choice to either accept or nities in his/her nearby area. The MediaWiki soft- reject the suggested claim. In January 2016 users ware embeds similar functionality with the ‘Spe- had performed about 90,000 approval or rejection cial:Nearby’ page that lists nearby pages and asso- actions.364 ciated images if the use shares his/her estimated browser geolocation. Automatic creation of links In a procedure what may be called automated link Semantic wikis discovery tools suggest intrawiki links from a word The framework referred to as the ‘Semantic Web’ in a Wikipedia article to an appropriate Wikipedia structures data.371 Essentially, a subject-verb- 365–367 article. object triple data structure represents data, where At one point Wikipedia user Nickj ran the Link the triple data may be represented in RDF. In the Suggester and Can We Link It. Another user ran middle of the 00s several research groups proposed a link discover bot from the account Anchor Link systems that merged the Semantic Web idea with xl Bot. The DPL bot does not suggest new links wiki technology372–376 with some of the systems but notify (human) editors about links they have based on extensions around the MediaWiki soft- added, that points to disambiguation articles and ware.377–379 Researchers compared 10 different se- may need to be resolved. mantic wikis in a 2008 paper and ended up selecting xxxixhttp://tools.wmflabs.org/wikidata-game/ the Semantic MediaWiki system for their knowl- xlhttps://en.wikipedia.org/wiki/User:DPL bot edgebase.380

41 In the German Semantic MediaWiki (SMW) the ever, a few third-party services soon appeared wiki page determines the subject of the triple struc- with Magnus Manske as the primary driver be- ture, while the wikilink sets the object. The verb hind the tools. His Wikidata Query editor at of the triple structure comes as an element in an http://wdq.wmflabs.org/wdq/ allowed for formu- double extension to the MediaWiki link syn- lation of queries such as “Places in the U.S. that tax, so that, e.g., [[has capital::]] on are named after Francis of Assisi”, and his au- the Germany page will result in the triple structure tolist tool at https://tools.wmflabs.org/autolist/ (Germany, has capital, Berlin). In the Japanese Se- would display a list of retrieved pages. Soon mantic MediaWiki377 the suggested syntax would Kingsley Idehen’s OpenLink Software company also be [[Term:Berlin|has capital]]. The German provided a Virtuoso-based SPARQL endpoint for SMW comes with query facility, so user may for- querying Wikidata together with other Semantic mulate semantic on-the-fly queries across all the Web data from the Linked cloud.xlii semantic data of the wiki. Metaphacts and University of also made Though not implemented in Wikipedia nor its early public SPARQL endpoints with Wikidata sister projects, the SMW extension has proven valu- data available.386 Finally, in September 2015 Wiki- able in many other contexts, and a large number media Foundation announced its SPARQL end- of extension on top of SMW have come into ex- point, the Wikidata Query Service, running from istence,28 e.g., the Semantic Result Forms exten- http://query.wikidata.org. “List of countries or- sion allows user to let the return dered by the number of their cities with female the result in a large number of formats, both text- mayor” was an example of possible queries. oriented and in plots of different kinds. The Seman- With its qualifiers and references the Wiki- ticHistory extension can enable semantic queries on data data structure does not map directly to the the wiki edit history.381 triple structure of the Semantic Web and SPARQL The Semantic Web does not directly allow for tools. However, various approaches—reifications— specification of n-ary relations beyond the triplet to can convert Wikidata data to the triple format. capture a structure such as (Germany, has capital, Benchmark profiling with four different approaches Bonn, in year, 1990). However, an n-ary relation and five different SPARQL engines found that the can be broken down into multiple triplets.382 The so-called n-ary relation model enabled the expres- BOWiki extends Semantic MediaWiki so n-ary re- sion of property paths.387 lations can be represented with the wiki syntax.383 When Wikidata is coupled with natural language Extensions for Semantic MediaWiki also allows for question powerful open-domain question- representation of n-ary objects. answering (QA) systems can be constructed. Stu- dents from Ecole´ Normale Sup´erieurede Lyon con- structed one of the first online QA systems with Wikidata Wikidata data, the Platypus system running from 388 Within the MediaWiki framework the 2012 Wiki- http://askplatyp.us. It was capable of answer- data proposalxli discussed the issues and presented ing questions such as “Who is the prime minister of a mockup with field/value editing.384 This system France?” and “Who was member of the Beatles?”. became operational in October 2012, and major bot activity soon made it the largest Wikimedia wiki in Other structured content terms of pages (or ‘items’ in the parlance of Wiki- Several systems exist that extending wikis with ta- data) and in terms of editing activity with about ble and database-like functionality beyond the Se- 90% of the edits made by bots.385 Inspired from the mantic Web framework, e.g., DynaTable,xliii that Semantic Web and semantic wikis the page of Wiki- defines structured data in a separate name space data acts as the subject, while the verb is referred and can display the structured data in tables on to a ‘property’ and the object will refer to another wiki pages with the use of special tags.389 Other Wikidata page or another type of value. Wikidata similar extensions with varying degree of matu- allows n-ary relations through so-called qualifiers. rity are WikiDB and DataTable extensions. The Limited support for units, such as heights and mass, relatively simple TemplateTable extension extracts came in September 2015. keys and values from templates used across the The on the base installa- wiki and generates a table with columns accord- tion of Wikidata was initially limited with no ing to key and each row corresponding to a tem- means to build advanced queries like the ones plate instantiation. No filtering is possible. The seen in SPARQL or Freebase’s MQL. How- xliihttp://lod.openlinksw.com/sparql. xlihttp://www.mediawiki.org/wiki/Wikidata xliiihttp://sourceforge.net/projects/wikidynatable/

42 Brede Wiki keeps its MediaWiki template data in to wikitext. The project should not only make en- a sufficiently simple format for complete databas- able good visual editing in MediaWiki-based wikis, ing the template content by extraction from XML but also increase rendering performance by avoid- dumps, enabling complex SQL data mining queries ing conversion overhead.397 to be made off-wiki.390 The Brede Wiki also keeps Text-oriented formatting languages, such as table-like data in simple comma-separated values LATEX and HTML, can have fairly dense markup format on wikipages for meta-analysis.391 The Sim- obscuring the text ifself. Documentation lan- pleTable extension formats the CSV data. guages focus on readability while maintaining for- None of these systems enjoy as large a success as matting capabilities, e.g., AsciiDoc, Markdown and the semantic wikis and Wikidata. reStructuredText. These language typically differ from wiki markup, though, e.g., GitHub Flavored Markdown has URL autolinking like typical wiki Form-based editing markup. Standard wikis have only one input field for the content of the Wiki. Semantic Web-oriented wiki Extended authoring engines, such as OntoWiki, present edit inter- The interface and the wiki markup may not ap- faces with one input field for each property of a peal to potential contributors with low level of page, thus each pages will typically have many technical ability. If they are knowledgeable con- forms. The Swiki system can associate forms to tributors, Wikipedia have lost the benefit of their wiki pages,392 and the template mechanism in the knowledge. In 2008 the Wikimedia Foundation re- CURE wiki engine allows the editor to define the ceived a grant from the U.S.-based Stanton Foun- interface for editing a multiple input form as well dation for a project to make the editing facil- as the presentation of the content.393 ity easier.xliv Years later in 2013 the Wikimedia An extension for the MediaWiki software brings Foundation rolled out the “WYSIWYG-like” edi- support for editor-defined forms, used in conjunc- tor VisualEditor on Wikipedia giving contributors tion with Semantic MediaWiki and the template the choice between usually raw wiki markup and functionality. As with Semantic MediaWiki the WYSIWYG-like editing. Editing defaulting to Vi- form extension is not enabled in Wikipedia, but is sualEditor generated controversy among Wikipedi- used on a number of SMW-enabled wikis ensuring ans, and English Wikipedia administrators manage a consistent input of data. to revert the decision making the VisualEditor an With Wikidata Wikipedians got the opportunity opt-in .398 to make form-based edits on language links and Som wikifarm companies, such as wetpaint structured content as usually found in infoboxes. and PBwiki, have WYSIWYG functionality. Wikidata form-based input helps the user by sug- MediaWiki-based Wikia wikifarm has WYSIWYG gestions for autocompletion. support. On Wikipedia user Cacycle’s wikEd Javascript in-browser adds enhanced Markup text processing. The web-browser plugin formats Wikis have a simple . Un- bibliographic references in the style of Wikipedia fortunately the markup have diverged slightly citation templates. ProveIt from Georgia Tech so the markup is not entirely compatible (http://proveit.cc.gatech.edu/) provides a struc- between wiki engines, e.g., MediaWiki does tured interface for editing references in MediaWikis not support CamelCase linking. WikiCre- via a popup window. ole (or just ) attempts to standardize markup and some wikis have adopted it,394 see Digitizing publications http://www.wikicreole.org/wiki/Creole1.0. The WikiCreole language has also been formalized in Wikimedia Foundation project collects ENBF.395, 396 electronic versions of publications. The difficulties with representing MediaWiki (http://www.gutenberg.org/) wikitext so standard context-free may started by Michael Hart has a longer history go- parse it means that creating tools for WYSIWYG ing back to 1971 and had by 1992 digitized many 399 editing is difficult. Software developers within the works. Lars Aronsson’s Project Runeberg is a Wikimedia Foundation project Parsoid implement Scandinavian effort inspired by Project Gutenberg. a node.js program that may represent MediaWiki xlivWikipedia to become more user-friendly for new volun- wikitext as HTML5, and convert the HTML5 back teer writers

43 To solve conflicts about differences between editions Extending browsing Aronsson started in 1998 to incorporate facsimi- A number of tools exists for reformating Wikipedia les with optical character recognition (OCR) and content for specialized devices. Jesse David collaborative proof-reading. Aronsson’s Runeberg Hollington has reviewed applications for the approach has now been adopted by Wikisource.400 iPhone, including the Wikipanion with features MediaWiki still forms the basis for Wikisource, but such as cache and offline reading.xlv enables the Proofread Page MediaWiki extension Wikipedia Diver is a Firefox plugin. It logs and has a ‘Page’ namespace where a facsimile page clicks between Wikipedia pages to a database is presented next to the wiki edit field containing and displays the browsing path graphically. The OCR’ed text from PDF or djvu files for collabo- Wikipedia Beautifier Chrome extension presents a rative proof-reading. The approach splits the con- cleaner view of the text with automatic hyphen- tent and presentation on different wikipages and a ation and various adjustments to the Wikipedia full book may be split across a number of pages, interface.xlvi Googlepedia Firefox add-on shows so tools using Wikisource as a corpus may need to relevant Wikipedia articles next to Google search implement a not straightforward procedure for as- results.xlvii A number of other browser plugins sembling of the actual texts and discarding of wiki changes or enhance the presentation of Wikipedia markup.401 in the browser. Facsimiles of paper encyclopedias may have mul- tiple articles on one page or one article spanning multiple pages, so facsimiles present two structures: Graphic extensions the page-oriented structure and the chapter/article- The image-oriented browsing program Indy- oriented structure. In Wikisource and Runeberg wikixlviii enables a user to search Wikipedia inde- proof-reading is page-oriented while the presen- pendently of an ordinary . The pro- tation of proof-read books is chapter-oriented in gram displays the ten most related images promi- Wikisource. nantly in its browser interface while also displaying the text and links of the article. A similar sys- tem is the Qwiki website that display images from Geographical extension Wikipedia and other sources in a multimedia envi- ronment. Wikipedia lets users markup a page with geograph- The Java program WikiStory contructs interac- ical coordinates via the use of MediaWiki tem- tive time lines based on Wikipedia material. The plates, — a simple approach to so-called volun- HistoryViz displays events related teered geographic information (VGI). Other wiki- to a queried person on a timeline. Apart from this oriented systems present a map that users draw on visualization the system also features a Java ap- while also providing descriptions of the geographi- plet graph visualization of Wikipedia pages.404 The cal coordinates.402 One such system is WikiMapia system relied on algorithms for categorization of (http://wikimapia.org) where users can annotate Wikipedia articles into persons, places or organiza- with retangles and polygons on a background of tions.405 Yet another timeline visualizer is Navino satellite images from . The CC-BY- Evans’ histropedia served from histropedia.com.406 NC-SA licensed data are exposed through an API. Timeline visualization features also as part of the One of the most (if not the most) promi- Reasonator Wikidata presentation Web service. nent VGI efforts is OpenStreetMap (OSM), Wikis can be extended with on-the-fly creation of 407 www.openstreetmap.org, that also provides com- graphs, and there exists an extension for Media- 408, 409 plete maps under share-alike license. Founded by Wiki that use the GraphViz software, and, Steve Coast in July 2004, users may add, e.g., roads e.g., the Brede Wiki uses the tool for automatic and nodes with key-value attributes with the help of creation of visualizations of intrawiki links between a background satellit image and GPS tracks. Ren- brain regions, topics and organizations from infor- dering may be through the Mapnik open source mation defined in MediaWiki templates. The in- library.403 OSM content may be integrated with trawiki links between Wikipedia articles form a di- Wikipedia, so articles with geographical coordi- rected graph. Together with a path finding algo- nates display OSM maps, e.g., through the Open- xlviPhone Gems: Wikipedia Apps. StreetMap SimpleMap MediaWiki extension. xlvihttps://github.com/scotchi/wikipedia- Another geographically oriented wiki is Wiki- beautifier/wiki/Wikipedia-Beautifier. xlviihttps://addons.mozilla.org/en- Crimes (http://www.wikicrimes.org) that seeks to US/firefox/addon/googlepedia/. map crimes collaboratively. xlviiihttp://indywiki.sourceforge.net/

44 rithm Erika Arnold’s wikiGraph Web service can touchscreen and 3 million topics from Wikipedia. find and visualize a path between to user-specified An even simpler device, Humane Reader, connects Wikipedia articles using the dump and the Neo4j to TV and could include an offline version version of graph database.xlix Wikipedia.414 The (OLPC) The ‘Copernicus’ system makes an attempt on project features the Wikibrowser with a selection a 3D wiki:410–412 A two-layer interface presents from the Spanish and English Wikipedia, while the the Wikipedia article in a transparent foreground, XOWA Java tool enables the setup of local offline while the background presents a 3D model related copies of Wikipedias and their sister projects as well to the Wikipedia article. The user can trigger pre- as their images.415 defined camera movements and adjust the trans- Offline Wikipedia deployment may be important parency. in areas with no Internet, and researchers have studied the use of Wikipedia in offline OLPC in Video extensions this context. In a field study Ethiopia students in- dicated that “browsing schools books and offline Multimedia authors have added animated images Wikipedia page” was the second most favorited ac- and small video clips in the Ogg Theora format in tivity on the OLPC.416 Wikipedia. Other editors can in principle edit these Versioning in Wikipedia is centralized as in the clips with external video editing software, but with client-server software revision control systems Con- far less ease of use as when editing wikitext. The current Versions System (CVS) and Subversion company have worked on a Open Source (SVN). Newer software revision control systems, technology to better support peer production of such as , may be distributed. There have video content. In the beginning of 2008 the com- been several suggestions and developments for dis- pany announced experiments with this collabora- tributed wikis that merged the wiki idea with the tive video platform together with the Wikimedia distributed revision control system idea. Levita- Foundation. As with other material Wikipedia al- tion would convert a Wikipedia database dump to ready enables collaborative tagging and discussion Git repositories. In Ward Cunningham’s Small- of videos through category markup and discussion est Federated Wiki a user will fork a wiki when pages on Wikimedia Commons. he edits. The owner of the original edit may then merge that edit back to the original wiki.417 Some Real-time editing of his thoughts on federating wikis go all the way back to a 1997 memorandom.418 Another dis- When two editors edit the same time on a Media- tributed wiki system is research prototype Concerto Wiki article and each one saves his/her edits then (http://concerto.xwiki.com), that was built for the the editor with the last save will experience an edit XWiki wiki engine as a peer-to-peer replicated wiki conflict that needs to be resolved manually. In this with an aim for data consistency between the repli- aspect MediaWiki differs not from standard revi- cated wikis.419 sion control systems used in software development. Distributed wikis may store content that diverges Several systems enable real-time browser-based col- from neutral point of view: What has been called laborative editing, and extended Etherpad will al- “every point of view”.420 The question on whether 413 low for some wiki functionality. Igor Kofman’s distributed wikis (Wikipedias) might also help for Hackpad script for live-editing MediaWiki pages re- an inclusionist approach to specialized knowledge, l quires only editing of a user page for installation. have also been posed.421

Distributed and disconnected Wiki and programming Wikipedia There are different ways to combine programming Wikipedia can be downloaded and read from the and wikis. The extension framework of MediaWiki local copy. Groups of editors have checked a se- allows developer to make additional programs that lected number of articles and distributed them on use resources either on the wiki server or on remote , see www.wikipediaondvd.com and Schools- servers. Other approaches enables end-user pro- wikipedia.org. The offline multimedia reader gramming for editors, e.g., one system allows edi- makes Wikipedia available offline. The simple small tors to write database queries for a SAP system.422 AAA battery-powered WikiReader comes with A MediaWiki extension integrates output from the R statistical computing environment. xlixhttp://wikigraph.erikaarnold.com/ lhttp://hackpad.posterous.com/live-editing-mediawiki- Wikis that allow for inclusion of different appli- with-hackpad cations, e.g., calendars, may be termed ‘application

45 wikis’. Wikipedia: Over 90% of 1743 self-selected respon- The MediaWiki software has a simple ‘program- dents were ‘very favorable’ or ‘somewhat favor- ming language’ for templates, which features, e.g., able’.425 Among Public Library of Science (PLoS) an ‘if’ sentence. However, the capabilities are lim- authors the result was 96%. Other results showed ited and the syntax quite obscure with excessive that 68% answered ‘yes, on a large scale’ to the number of curly braces. In 2012 Wikimedia be- question ‘would you be in favor of effects to in- gan serious experiments with enabling the Lua pro- vite scientist to add or improve Wikipedia articles’. gramming language on test MediaWiki installa- Such results are very positive for Wikipedia, but tions. A successful application would fundamen- may be biased due to the self-selection of respon- tally change the way that MediaWiki users write dents and that the publisher web site with initial templates and make way for more advanced pro- reference to the survey was . gramming. In the first part of 2013 Lua was en- In 2010 Dooley surveyed 105 university fac- abled on the Wikipedias. ulty members and found that 54.4% would rank A embedded in a wiki Wikipedia to be moderately to very credible, 26.6% could provide means to format data, e.g., comma- would said it had some credibility, although not separated values into tables, and could be used much and 20% that it had “no credibility”.426 to make small scale numerical computations, — Individual researchers have called out for col- functions that now needs to be programmed either legues to contribute to Wikipedia, e.g., “Scientists in a MediaWiki extension or in an external web who receive public or charitable funding should [...] script.391 seize the opportunity to make sure that Wikipedia The PyPedia website www.pypedia.com com- articles are understandable, scientifically accurate, bines MediaWiki and the Python programming lan- well sourced and up-to-date.”427 In 2011 Alexan- guage. With each wikipage containing a Python der Bond called out for ornithologists to appropri- function or a Python class PyPedia allows Internet ate Wikipedia “as a teaching and outreach tool”, users to write and execute code in the browser or e.g., “[P]rofessors can replace essays and reports execute the code on the wiki in a local installation assigned to students with the creation or improve- of Python.423 ment of a taxonomic Wikipedia entry” and rec- ommend the WikiProject Birds page as a start- ing for ornithologists.428 Writing for students and 5 Using Wikipedia and other lawyers that would read Wikipedia, Diane Murley wikis in research and educa- concluded that “Wikipedia can be a great research tool if used appropriately” and that “it is a good tion tool to use to teach researchers about the necessity of evaluating sources”.120 As a Web-based collaborative environment Some scientist contributing to Wikipedia have Wikipedia and other wikis offer researchers and voiced concern about the priorities of Wikipedia. students on all levels a means for communicating, Antony Williams asked in a blog post in 2011 — reading and writing about the topic at interest, “Why are pornstars more notable than scientists on e.g., wikis can offer Ph.D. Students an environment Wikipedia?” noting that scientist biographies he for literature research where a wiki page contains had written had been flagged for notability, even information about a specific research paper: its though one of the biographed scientists had over summary, research questions, methods, results and 300 publication and an h-index of 27. Williams’ comments.393 own Conflict of Interest in the case had been ques- In Ten simple rules for editing Wikipedia Logan tioned. Direct Wikipedia contributions from sci- et al. guide scientists how to write on Wikipedia.424 entists, as well as other experts, are not automat- For example they suggest scientists to register an ically regarded as appropriate. Wikipedia is not account for privacy, security and reputation build- interested in what experts know, but rather from ing as well as to gain access to the “watchlist” where the experts know their knowledge, and that feature. They suggest that scientist should “avoid the source has “institutional approval”. shameless self-promotion” by not writing their bi- Rosenzweig argued that professional historians ography page on Wikipedia (let others do that). can even learn from the open and democratic pro- duction and distribution model of Wikipedia, and Attitudes towards Wikipedia he asked why scholarly journals are behind and sponsored American National Biography On- A Wikimedia Foundation survey has found re- line is available only to libraries at a high cost.20 searchers to be generally quit positive towards He thought that historians should join in writing

46 history in Wikipedia, but also noted issues with the mary sources for preparing for exams. ban on no original research and “difficult people” Question books (88%) and the peer-reviewed web- that drive experts away as hinderances for profes- site Up-to-Date (59%) were more frequently used, sional historians writing on Wikipedia. He called but textbooks (10%) less used. Among the students for systems for digitizing historial documents and using Wikipedia 84% also used question books.432 for open-source textbooks, — puzzlingly not men- In Dooley’s 2010 study among university faculty tioning the Wikisource (with oldest mainpage ed- 45 of 105 respondents said they used Wikipedia in its in November 2003) and (with oldest their teaching and/or research, 40 occationally, and mainpage edits in May 2005) efforts. 20 respondents said they never used Wikipedia for Echoing Rosenzweig in another domain, teaching/research.426 Wikipedian and healthcare professional James In a paper discussing how students could use Heilman encouraged healthcare professionals to get Wikipedia Jeff Meahre suggests that students pe- involved in Wikipedia stating among his reasons ruse the discussion pages to “see the process of that “Wikipedia has taught me critical reading, knowledge creation” and show why citations are which has made me better equipped to deal with important.433 less reliable sources of information such as pharma- In a study on high school students’ writing as- ceutical representatives.”429 The positive attitude signment for on a wiki the researchers found that towards Wikipedia in the healthcare domain has they would use Wikipedia even though they were led to a closer collaboration between Wikipedians aware of its limitation as a source.434 and members of the Cochrane Collaboration for strengthening the dissemination of sythesized Citing Wikipedia research.430 Many users would say that Wikipedia works well Use of Wikipedia for background reading and doubt that you can use Wikipedia as a source. In the beginning of 2007 a Several studies have used surveys to examine the department at the Middlebury College would hold use of Wikipedia among university faculty426 and students responsible for using Wikipedia as a source students.93, 254, 431, 432 These studies may aim to after a batch of students had used erroneous in- answer how much and for what purpose they formation on Wikipedia about topics in the his- use Wikipedia. Apart from the direct academic tory of Japan (Shimabara Rebellion and Ogyu So- and cognitive purposes of obtaining knowledge rai).435, 436 Media reports implied that the depart- and satisfying curiosity other motivations for us- ment of Neil Waters, the teacher of the class, ‘was at ing Wikipedia may be hypothesized:431 emotional, war with Wikipedia itself’. However, Waters him- personal integrity, social and tension-release needs. self actually told students ‘that Wikipedia is a fine The Wedemeyer study asked students to evalu- place to search for a paper topic or begin the re- ate Wikipedia biochemistry articles. One third re- search process’. The policy adopted by the depart- sponded that they never use Wikipedia. Among ment was: the remaining two thirds 12% used Wikipedia as their primary source and 31% used their textbook 1. “Students are responsible for the accuracy of and Wikipedia equally. The remaining 57% used information they provide, and they cannot Wikipedia only as a supplement. The majority of point to Wikipedia or any similar source that the students preferred Wikipedia to the textbook.93 may appear in the future to escape the conse- Sook Lim performed a survey among under- quences of errors, graduate students, where 134 out of 409 re- 2. Wikipedia is not an acceptable citation, even 254, 431 sponded. Reported in 2009 and 2010 she though it may lead one to a citable source.” found that all students had used Wikipedia, but that the use was not necessarily for academic pur- At Lycoming College a teacher in history would poses alone as students would also use Wikipedia outright ban Wikipedia from student’s bibliogra- for “entertainment and idle reading” and nonaca- phies giving a grade of zero if any such citation demic personal information interests. That was es- appeared.437 pecially the case for the male students that also Waters’ policy is in line with the opinion of the had a higher perception of information quality and Wikimedia Foundation. However, in the end of belief in the Wikipedia project. Another study 2007 Jimmy Wales said that he saw no problem in reported in 2011 found that 47% of 186 medical younger students using Wikipedia as a reference, students who recently had completed psychiatric and that it should be used as a stepping stone to clinical clerkship used Wikipedia as one of the pri- other sources.438

47 It is fairly easy for a teacher to point to correct rectly use Wikipedia as primary venue to pub- information on Wikipedia by the use of permanent lish their findings: In “Ten Simple Rules for Edit- links to a specific version of an article. Teacher ing Wikipedia” Wikipedian scientists would write: may link to the version s/he has reviewed, so that “Perhaps most important for scientists is the appre- students may feel comfortable in using the infor- ciation that Wikipedia is not a publisher of orig- mation on Wikipedia and use it as a supplement to inal thought or research”,424 or as historian Roy textbooks. Rosenzweig expresses it NOR “means that you can- Citing Wikipedia will be a problem if administra- not offer a startling new interpretation of Warren tors delete the page, e.g., due to notability. Both Harding based on newly uncovered sources.”20 Re- the present and the older versions of the article will searchers are of course welcomed, but should cite be inaccessible to the editor and reader. the relevant work with pointers to institutionally Regardless of the problems with citing Wikipedia approved sources, e.g., scientific journal articles. research papers do indeed cite Wikipedia.426, 439–441 From a scientist point-of-view the problem with Dooley examined 250 research reports published in writing in Wikipedia is that of “academic reward”. 2009 and in the beginning of 2010 that returned on Writing a Wikipedia article has not traditionally a query on Wikipedia from a search on Academic given the same amount of academic reward as writ- OneFile electronic database: 249 of the papers used ing a peer-reviewed scientific article. Wikipedia as a source of scholarly information. In For better integration of Wikipedia and scientific 27 of the papers Wikipedia was the main topic and writing—both handling the problems of NOR and 426 62 had brief mentions of Wikipedia. In a similar academic reward—journals have come up with the citation study Brazzeal would find 370 chemistry scheme of joint publication in a scientific journal 441 article from 2005 to 2009 citing Wikipedia. The and Wikipedia. In 2008 RNA Biology became prob- author had searched among 312 chemistry jour- ably the first journal to venture into this approach, nals from the websites of the three publishers El- requiring authors, that submitted a paper to a sec- sevier, Springer and the American Chemical Soci- tion in the journal, to also submit a Wikipedia page ety. Of these 312 journals 147 had at least one summarizing the work. The first article became A citation to Wikipedia. 9% percent of the arti- Survey of Nematode SmY RNAs.442 The journal cles had a citation to numerical values of physi- article would undergo full peer-review and thus act cal and chemical properties in Wikipedia. In a as a source for the Wikipedia content. Another 2008 study in the area of humanities Lisa Spiro case came in 2012 where the scientific journal PLoS retrieved 167 results between 2002 and 2008 cit- Computational Biology allowed researcher to sub- ing Wikipedia, while a corresponding number for mit educational “topic articles” that were meant Encyclopædia Britannica was 152. Well over half to be both published in the journal as well as in- of the Wikipedia citation was “straight citations” cluded in Wikipedia.443 The first such article was 439 without commentary about it. Using the ISI “Circular Permutations in Proteins”.444 Web of Science service Noruzi would find 263 cita- It is interesting to see whether Wikipedia articles tions to Wikipedia over a 6 year period, — far more can compete (and even outcompete) the traditional than a corresponding 10 citation to Encyclopædia scientific review article. Britannica.440 The English, Frensh and Chinese Wikipedias themselves maintain (presumably in- complete) lists of research papers citing Wikipedia Special science wikis as a source: Wikipedia:Wikipedia as an academic source. Wikipedia has inspired numerous other science The number of citations to Wikipedia from re- wikis. Bioinformatics alone has already quite a search papers is not large and perhaps authors large number of wikis. Several biomedical re- check information in Wikipedia before citing it only searcher have suggested dedicated wikis for their using Wikipedia as a proxy, so one may ask: Has field since around 2006.390, 391, 445–449 any research paper actually cited information that The reason why scientists choose to operate was wrong, as, e.g., the case in news media with their own wiki instead of adding information to the hoax for the Maurice Jarre biography? Wikipedia is probably due to one or more of four issues: Publishing research on wikis and Wikipedia • The control and ownership of data (an issue particularly important for companies), The “no original research” (NOR) policy on Wikipedia means that researchers cannot di- • the addition of specialized interfaces and data,

48 • the problem of notability in Wikipedia, i.e., and related areas and has strong control over au- certain parts of the research information is thorship. Operated with Eugene M. Izhikevich as viewed to be too specialized for a general en- Editor-in-Chief it lets invited experts write specific cyclopedia, articles and let other experts review the article. He has had the ability to attract Nobel prize winners as • the control of authorship authors. At least one of the articles are archieved in the PubMed Central and reference in PubMed un- Several wikis are up and running handling der Scholarpedia J..463 With such strong author- research information: A Leipzig group ex- ship control Scholarpedia has been low on hyper- tended the MediaWiki software incorporating links and seems to have little collaboration, thus it semantic wiki functionality for gene function lacks some of the fundamental characteristics of a (http://bowiki.net).383, 450 Other groups have cre- wiki. In the end of year 2011 Izhikevich changed ated wikis for laboratory protokols (OpenWet- the editing system so that “any user can nominate Ware),451, 452 single nucleotide polymorphisms (the himself to write an article on any topic of his exper- SNPedia)453 and biological pathways (WikiPath- tise” although the user had to gain “sponsorship” ways).454 Other Internet-based biomedical from existing Scholarpedia curators.li databases have added community annotation, e.g., Other wikis also control the authorship: CutDB.455 CHDwiki for congenital heart defects and AskDrWiki limit the editing to registered med- and YTPdb for classification of the yeast mem- ical doctors,452 and the CC-BY-SA MediaWiki- brane transporter use the WikiOpener MediaWiki based Mediapedia similarly controls the authorship extension to include material from bioinformatics by requiring editors to be either approved physi- Web databases in the wiki.362 Other bioinfor- cians or scientist with doctoral-degree in biomedi- matics wikis are ArrayWiki,456 BioData Mining, cal specialities.464 Finally the general encyclopedia EcoliWiki, Open Notebook Science, PDBWiki,457 does not allow anonymity. PiQSi, Proteopedia,458 TOPSAN, WikiGenes,459 446, 460, 461 and WikiProteins. A large share of the Censorship scientific wikis uses the MediaWiki software, and often with extensions for handling structured data, In 2008 Bryan Derksen uploaded images of e.g., OmegaWiki for WikiProteins460 or Semantic inkblots to the Commons, and in MediaWiki for SNPedia.453 June 2009 added the complete set The Rfam database for RNA families is an ex- of images to the Rorschach test article on the En- ample of a combination of scientific expert cu- glish Wikipedia. Along with the images came most rated database and wiki. The summary informa- common response to the each inkblots. Psycholo- tion of Rfam was moved to Wikipedia and now gists administring the test angrily held that pub- the free-form summary information in Rfam is in- lishing such information on Wikipedia would jeop- cluded from Wikipedia with the result that “[a]ll ardize this psychological test, and American Psy- of the family annotations are now greatly im- chological Association executive director of science proved”.361 The Rfam database edit button leads Steve J. Breckler said “We wouldn’t be in favor of to the Wikipedia edit page. Rfam-specialized data putting the plates out where anyone can get hold in Rfam “remain curated by Rfam’s specialist anno- of them”.465 tators”. With the added information in Wikipedia Some amount of censoring may occure for cer- around 15% of web traffic to the Rfam database tain ‘dangerous’ information, such as the manufac- was in 2010 driven by Wikipedia.173 toring of acetone peroxide or details of Zippe-type The problem of notability was the reason I centrifuge and gas centrifuge. started the Brede Wiki. Wiki pages on this wiki Personality rights may vary between countries, may each describe an individual research articles weighing differently the border between free speech and usually such articles will not be deemed no- and the right to private life. The Wikimedia table enough for Wikipedia as a general encyclope- blog reported that a German university professor dia. A specific idea of Brede Wiki was also to main- brought suit against the Wikimedia Foundation. tain structured information in a machine readable He felt that the German Wikipedia violated his format for data mining.390, 391, 449 Related wikis right of personality, particularly with respect the summarizing academic work are Acawiki, WikiPa- mentioning of his membership of student associa- pers462 and WikiLit,14 all using Semantic Media- tions. The report did not mention why the pro- Wiki. fessor felt this way. To me the membership seems Scholarpedia is a MediaWiki-based science wiki liScholarpedia change announced on the Connectionist especially strong on computational neuroscience mailinglist.

49 entirely innocuous. However, the mentioning of it 9% of the researchers had edited or inserted a ref- could be disputed. Denmark treat personal infor- erence to their work on Wikipedia within the last mation, e.g., concerning race, ethnicity, religious, 12 months and around 3% has edited their own philosophical or union affiliations as well as health, Wikipedia biography.468, 469 sex and gender in a special way. Data cannot usu- ally be registered and distributed without consent, Wikipedia and wikis as a teaching tool and the mentioning of his membership could ex- pose his religious affiliation. However, in the case Although educators generally warn against using with German professor the Wikimedia blog stated Wikipedia as a authoritive resource, some promote that “the information at issue was both accurate Wikipedia as an example of a digital environment and freely available on several other websites under with rich opportunities for collaborative writing Asche’s authorization”, and the Wikimedia Foun- and for considering “new roles and relationships dation won the case.466 for writers”.470 In a widespread Wikipedia en- gagement teachers give writing assignment to stu- dents in Wikipedia editing,471 e.g., at the Univer- Scientists on Wikipedia sity of East Anglia students of International Rela- Elvebakk in her analysis of online philosophy re- tion should edit and write article about the Mid- 472 sources speculated that philosophers have added dleeast. They were rated based on their im- Wikipedia entries for themselves as a form of self- provements of eight articles and should write one promotion.83 of their one. The often very controversial topic Carl Hewitt, a retired professor from MIT, came in this area demanded that the students were able into the spotlight at the end of 2007 after ed- to balance the different opinions and support with its on the Wikipedia.467 He has a long career sources. Similar assignments were given in his- 437 within programming, but though he has made tory in Lycoming College, “Exploring the Past” 473 significant contribution within the area, several course at the University of Baltimore and envi- other Wikipedians have regarded his edits as self- ronmental history at the University of Washington- 474 promotion. Among others, he have added descrip- Bothell. Some teachers chooses to use dedicated tion of his research (the so-called actor model) in wikis, such as CoWeb (a technology that goes back 211, 212 434 physics articles, where others thought they did not to 1998) or an installation of MediaWiki. belong The disagreement led to long discussions Apart from Wikipedia writing assignments, and a case for the , which teachers may also give assignments in Wikipedia decided to ban him. article review, adding references to an already ex- 471, 475, 476 The case shows an example on how difficult it isting article, translating or copy editing. may be to write from a neutral point of view if Yet another exercise has students monitoring the changes made to the article they have contributed there is a conflict of interest and one is deeply en- 437 gaged in a field of research. However, the guidelines to and post their findings. of Wikipedia allow specialists to add their own re- Students benefit from the Wikipedia writing as- signments by having their work exposed not only search, if has background in a reliable publication 471 and follows other rules set by Wikipedia.lii to the teacher but to the entire world. Stu- dents also mention gaining skill of doing thorough I have myself added a few references to articles research as a benefit.475 Stronger research base is I have authored. Universities expect researchers to also mentioned by teachers.437 In wikis, interaction make their work more widely known, and extending between peer students may help to improve their Wikipedia is one way to spread research both fellow writing and in a open wiki the students may be- researchers as well as ordinary information seekers. come aware of the audience212 and contemplate on Researchers may be quite attentive to what is the value of authority.437 Wikis may also make the written on Wikipedia about themselves: In a 2011 writing process more visible though students revise Nature poll, recruiting 840 researchers through extensively offline.212 Furthermore, students using email and social web sites, 19% responded that they wikis may get a better attitude towards collabora- once per week or more often check Wikipedia for tion and higher grades.211 Teachers mentions other citations to themselves or their own work. The cor- advantages such as student improvements in com- responding number for citation-counting sites (such puter literacy, Internet resource critique and col- as ISI Web of Science) and search engines were 30% laborative work preparation.476 Student reaction and 38%, respectively. The same poll reported that to the assignment may initially be worry and anxi- liiSee “Wikipedia:Conflict of interest” (388727443) and ety and later reactions may be irritation, pride and “Wikipedia:No original research” (388842869). indignation.437

50 In a graduate seminar on plant–animal inter- a number of others on other language versions of actions participants assessed the quality and con- Wikipedia. The earliest dated all the way back to tent of ecology content on Wikipedia. They found 2002. The field associated with most projects were Wikipedia generally limited in depth and breadth humanities followed by social science, engineering, and with too few citations. Then they proceeded medical and natural sciences. He noted teacher is- to edit Wikipedia in their domain and found the sues: making student understand the implication process “straightforward and efficient, particularly of the license under which they would publish their once we learned the protocol for proposing and im- work, choosing Wikipedia text and the construc- plementing changes”.477 tion of warm-up tasks (creating an account, basic Apart from having the students learn the topic editing, wiki syntax experimentation and catego- and the teacher having a means to evaluate the rization). Aims mentioned were increase in student students, Wikipedia assignments may also help to motivation and knowledge, learning to do collab- construct learning material when no good textbook orative writing and gaining the “rigour and bal- fully covers the area.476 ance in writing encyclopeaedic articles” with neu- For smaller Wikipedias the work of the students tral point of view, theoretical analysis of Wikipedia on Wikipedia may not catch sufficient attention and propaedeutic: making research, editing and from other Wikipedians. In a small study on the bibliographic processing. Some projects attempted Danish Wikipedia historian Bekker-Nielsen moni- to reach ”Good Article” Wikipedia status or better tored 17 Wikipedia articles in 380 days after his while also failures occurred: In one case only 7 of students had created the initial articles, e.g., for 70 produced articles survived. Demetrios Poliorketes and Carausius. The highest Robert Cummings’ 2009 book Lazy Virtues: number of edits an article received in the 380 days Teaching Writing in the Age of Wikipedia33 ex- was 26. Of the total number of edits only 6% were plores teaching writing in relation to Wikipedia. content-related. Bekker-Nielsen also pointed to one The book was reviewed by 3 Wikipedias in the 27 worsening edit by a Wikipedian as well as wrong April 2009 Wikipedia Signpost and its issues dis- references in one article created by the student cussed in the podcast Wikivoices episode 41. and not removed by a Wikipedian before Bekker- Nielsen’s study was made public.107 Using wikis for course communication Among the problems faced in Wikipedia assign- ments are, e.g., student username being censored Teachers may use wikis for asyncronous commu- and students banned because of repeated insertion nications in a course.479 As an encyclopedia of copyrighted material.437 If students work on con- Wikipedia should not contain information about troversial and highly visible topics the article may specific courses or discussion between individual suffer vandalism and it may be difficult to “dis- teachers and students. Another Wikimedia Foun- entangle” the contribution of the student. Com- dation project, Wikiversity, serves the purpose of bination of little notability and poor writing may building a resource for teaching and learning. A result in the student article gets deleted. In a writ- quotation from the page Wikiversity:Scope regard- ing assignment on history a teacher found that the ing teaching reads: vast majority of articles created by students were “The Wikiversity should become a deleted.473 Modern educational software systems premier source for accurate teaching ma- often use automated plagiarizing detection soft- terials — bibliographies, lesson plans, syl- ware. If the student submit directly to Wikipedia labi, effective animations, well written pa- then the teacher may have difficulty in evaluating pers, ideas for in class and at home exer- the extend (if any) of plagiarism, and as Wikipedia cises, discussion topics, course work, etc. articles get copied to numerous sites (and possi- — as well as offer a clearing house for new ble into the plagiarizing detection software) it may and emerging teaching methodologies and quickly become difficult to track down the origin technologies of learning. Teachers making of a text. Teachers can get around the plagiarizing use of Wikiversity materials are encour- detection problem by having the students submit aged to comment on and adapt those ma- their Wikipedia assignment to the teacher before terials in a continual effort to refine and any Wikipedia edits are made.437 improve Wikiversity’s resources.” Descriptions of courses that used Wikipedia as a teaching tool appear on the page Wikipedia:School Researchers have reported experiences of develop- and university projects.471 In 2011 Klaus Wan- ing and delivering a course through Wikiversity. nemacher would review these projects.478 He The 10-week course Composing Free and Open On- found 132 projects on the English Wikipedia and line Educational Resources communicated weekly

51 readings, quasi–formal assignments and feedback torial called Getting to Grips with LaTeX available through Wikiversity. The course also used video from http://www.andy-roberts.net/writing/latex. conferencing and blogs and included some 70 par- Wikibookians contacted Roberts, who gave permis- ticipates from all over the world, though “towards sion to the use of his content on Wikibooks. This the end of the course less than two dozen indi- book is now one of the prime resources on LaTeX viduals were participating with any level of con- reaching top 5 in my search on Google and Bing sistency”.480 with the query “”. So is the initial seeding Wikiversity falls within the open education the problem with Wikibooks? Would a good ini- movement. Outside Wikimedia are other open edu- tial seed establish a modularization that Wikibooks cation wikis, such as the WikiEducator (wikieduca- contributors can use as a base for a successful work? tor.org). The ideas behind these efforts have been summed up in the Open Education Dec- laration (www.capetowndeclaration.org). 6 Open questions

What are the open questions in wiki and Wikipedia Textbooks research? Wikimedia Foundation project Wikibooks provides a site where editors may write free books collabo- The big equation? ratively, e.g., open textbooks. Though a number of books have been declared ‘featured books’ the Is it possible to setup an equation for when a wiki success has been mixed. Commentors have pointed will evolve successfully? Or how successful it is? to the difficulty in modularization of the text as What variables are part of such an equation? Are compared to an encyclopedia as a reason for lack of money and gender part of the equation? major success.481 The Wilkinson-Huberman equation (see page 31) A 2008 survey among 80 Wikibook contributors is surpricing. Why would the number of edits on (based on 1’500 contacted) found that the respon- an ensemble of articles only depend on the age of dents were mostly younger than 25 years (58%) and the articles and a “random” fluctuation? What is male (97.5%).482 Development of books are usually this “random” fluctuation? Why are there no ex- associated with high educational level, but the sur- plicit external variables in the equation? Surely, an vey found that 39% only had a high school degree edit from a user may result in that edit showing up or lower. About one in four reported frustration on the “my watchlist” of another user and possi- with their wikibook project. 42% indicated that bly motivate that other user to make a new edit. the book could not be completed.482 In this case the edits are driven by a Wikipedia- Israeli researchers have reported on the develop- intrinic mechanism, but intuitatively should edits ment of an academic textbook on information sys- not primarily be driven by notability of the sub- tems using wiki technology.483 Faculty, undergrad- ject? The Wilkinson-Huberman equation models uate and graduate students participated during the the collective of an ensemble of articles of similar course of two years using the MediaWiki software. age, meaning that the model may have room for the The authors argues that the wiki textbooks leads notability of the subjects within the lognormal dis- to empowerment of both teachers and students over tribution. The modeling rested on Wikipedia data commercial publishers. About 1’200 students from up to November 2006 in the period of Wikipedia 20 different classes and three different universities large growth. Given the decline in the number of have used the text book. The wikitextbook was de- editors, does the model still hold? In the same veloped from an original outdated e-textbook con- work283, 284 the two researcher show that quality ar- sisting of 225 subsections that were converted to ticles are associated with more edits, so do quality wikipages. The students read and augmented the articles attract more edits or do many edits results pages and created new ones, so the number of pages in a quality article? would approximately double. The researchers Why are some Wikipedias bigger than others? found that for some classes the wiki contribution The usual initial answer is that the size relates to of students was associated with higher grades on the number of speakers of the language. This does average. They also found a long-tailed distribu- not explain all variation, and Morten Rask intro- tion of edit activity among the students. Interest- duced further features to explain the size variations: ingly, one of the successful Wikibooks, the one on Human Development Index (HDI), Internet pene- the LaTeX markup language, was also seeded from tration and days in operations.125 These extensions an established work. Computer scientist Andrew can apparently not explain some size variations, Roberts had written an well-structured online tu- e.g., the Norwegian bokm˚alWikipedia had as of

52 January 2014 over 400’000 articles while the Danish complexities. Other explanations for the observed well below 200’000 and the Swedish over 1.6 million. variation build on the idea of ‘preferential attach- The differences in sizes of Nordic Wikipedias are ment’, ‘success-breeds-success’ or ‘cumulative ad- mirrored in the size of Wikimedia Chapters: Wiki- vantage’. If this explanation is the case then what is media has had almost 100 paying mem- the mechanism of feedback? The experiment with bers of the organization, while the Danish chapter barnstars (see page 29) produced only a relatively has had around 20 members. While differences in small variability in productivity between groups days of operation, language speakers, HDI and In- that did and did not receive a social reward. The ternet penetration can to some extent explain the Kaggle prediction competition Wikipedia’s Partici- Wikipedia numbers it does not explain all. The bot pation Challenge in 2011 sought “to understand activity with automated creation of articles on the what factors determine editing behavior”. In this explains part of its large size, competition participants should predict the future but why is the Norwegian bigger than the Danish? number of edits a contributor would make. One of Faroese Islands and Greenland share almost the successfull competitors used only the edit history same population, yet the Greenlandic Wikipedia for prediction based on the features: number of ed- shows very little activity with some 1’600 articles, its in recent time periods, number of edited articles while the Faroese Wikipedia shows more activity in recent time periods and time between first and with over 7’000 articles. The activity of the Faroese last edit.488 Another system used over 40 features Wikipedia seems to be related to the dedicated ac- but did not yield better results.489 A deeper ex- tivity of a single individual. planation for these results lacks: “You edit because What can explain the large difference in number you have edited” seems not a satisfactory explana- of edits contributors make on Wikipedia? For em- tion. ployed knowledge workers, such as scientists and Can Wikipedia withstand contributions from computer programmers, studies have reported in- professional public relations professionals and dividual variability in productivity with ratios in search engine optimization companies? Can the the tens,484–486 as well as reporting heavy-tailed evolution of conflict-of-interests edits be quantified? distributions of productivity.484 Although these In 2005 and again in 2006 Eric Goldman predicted numbers are large, the individual differences in that Wikipedia would fail within five years.182, 183 Wikipedia contributor output dwarf them with or- Yet again in 2010 he would point to the prob- ders of magnitude. The usual Wikipedia stud- lems faced by the Open Directory Project as a ies focusing on contributor output do not mea- user-generated content site becoming overwhelmed sure productivity directly (as work per time unit), by relentless spam, providing a cautionary tale to but rather the overall output, so the time used for Wikipedia.268 Seth Finkelstein would in 2008 write Wikipedia contribution could possibly explain the that he “sometimes remind people that ideologi- huge differences. A report on , the cal communes tend to end badly, too often with first Wikipedian to reach one million edits, said a few manipulative leaders extensively profiting at “he has spent 10 hours a day reading and adding the expense of a mass of followers who lose ev- to posts”.487 Is there a difference in productiv- erything”.224 Yet in 2015 Wikipedia still stands. ity among Wikipedians? Studies on Still in 2013 reporting from a major sockpuppert productivity suggest experience, motivation and investigation journalist Simon Owen gave a pes- intelligence and “a general factor of programmer simistic outlook: “But while reporting this article proficiency” as explanations for the large differ- I couldn’t help comparing the sockpuppet discov- ences,485, 486 while William Shockley in his study on ery to a large drug bust—perhaps it might take scientists considers an ‘organizational hypothesis’, out a major kingpin, but at the end of the day mental combinatorial ability and a hurdle model it’s a relatively minor victory in what is an other- with interacting mental factors.484 The organiza- wise losing war on drugs.”197 Furthermore, in tional hypothesis (dismissed by Shockley for sci- the spring of 2015 Wikipedian and researcher Pi- entists), where heads of laboratories get coauthor- otr Konieczny op-eded his growning concerns about ships, is not relevant for Wikipedians as contribu- “advertisements masquerading as articles” arguing tors cannot become coauthors on edits. Wikime- that “Wikipedia has been gravitating towards a ve- dia’s suggestion for an “Attrition Pipeline” with hicle for business and product promotion for too “impediments to editing at each stage of editor life- long” and called out for a major cleanup drive of cycle” would resemble the Shockley hurdle model, if promotional articles which he with a napkin esti- there are individual differences in contributor per- mate put to 300,000 on the English Wikipedia, i.e., sistency in overcoming each impediment, e.g., a 3% of the total number of articles.490 The struggle with wiki syntax or dealing with policy email leaks from November 2014 cast some light

53 on companies Wikipedia practices, showing a Sony systematically ensure coverage of ‘significant’ pub- employee apparently updating Wikipedia articles lished open access research in Wikipedia.liv How on the CEO and the company encouraging creation feasible is such a project given the large number of of articles about their movies.491 scientific articles published each year? In 2013 Jimmy Wales pointed to what he saw Writing about scientific topics in Wikipedia one as two threats for Wikipedia’s ongoing success: comes to question the validity of the claims in many authoritarian governments censoring the Internet scientific articles. Science itself has had discussions and “clumsy legislation driven by special interests”, about the issue sometimes referred to as the ‘repro- such as the .492 Are these ducibility crisis’ in experimental sciences.494 Ioan- threats serious? nidis claims from theoretical considerations and a number of assumptions that “most published re- What relation should there be be- search findings are false”,495 — ironically a claim 496–498 tween academics and Wikipedia? itself critized as false. However, he has done empirical research finding that a considerable part What relation should there be between academics of highly-cited biomedical does not replicate in sub- and Wikipedia? Should academics be encouraged sequent studies.499 Wikipedia already has a warn- to add their work on Wikipedia and rewarded or ing in the content guideline Wikipedia:Identifying would a large-scale involvement of academics dete- reliable sources against relying on individual stud- riorate Wikipedia because academics in search of ies: “Isolated studies are usually considered tenta- employment and grants would tend to make biased tive and may change in the light of further academic conflict-of-interest contributions with overstating research.”lv Is Wikipedia more accurate than the the importance of their own work? Does formel average scientific article? If so, can the Wikipedia academic contributions short circuit the open col- process be transfered to the scientific process? Can laborative hobby-oriented nature of Wikipedia edit- Wikipedia and similar tools do anything about the ing? Should Wikipedia develop a method whereby ‘reproducibility crisis’ in experimental sciences? experts can claim an article version as expert veri- Although citations and universities may be fied? ranked to some degree based on citations in links Since 2012 Wikiversity has incorporated a few in Wikipedia,80, 148 scientists have poor coverage peer-reviewed articles, e.g., “Diagram of the path- on Wikipedia.89 Should we caution against using ways of human steroidogenesis”, “Reference ranges Wikipedia for altmetrics? And if Wikipedia alt- for estradiol, progesterone, luteinizing hormone and metrics becomes established would that have a de- follicle-stimulating hormone during the menstrual terioating effect of the quality of Wikipedia? How cycle” and the initial study “Average weight of do we ensure that Wikipedia continuously reflects a conventional teaspoon made of metal”493 all the knowledge in the academic literature? written by Mikael H¨aggstr¨om. On Wikipedia’s Small scientific wikis, independent on large orga- WikiProject Medicine talk page editors have dis- nizations such as the Wikimedia Foundation, may cussed whether such articles can act as reliable run into sustainability problems like other online sources for Wikipedia.liii The discussants would not scientific databases: The wikis are created and op- fully embrace and recognize Wikiversity as reliable erated not seldomly by research groups on time- sources, — with its present form of peer-review. limited research funding, and if the funding stops It is a question whether wiki-based publishing can the continued operation of the wiki is jeopardized. gain traction and sufficient recognition, so that such Open wikis may require daily attention which is a Wikiversity-based approach is a viable option for not available from researchers moved on to new career-oriented scientists. Should such an approach projects. A few scientific wikis have stopped op- be supported? eration, e.g., in January PDBWiki produced the Why are not all scientific articles cited or men- message “PDBWiki is offline as of January 14th, tioned on Wikipedia? Is it because the topics that 2014”lvi Also WikiProtein seems no longer avail- the scientific articles cover are not yet dealt with able. It is an open question whether Wikipedia and in depth by Wikipedia, or is it because the topics its sister projects, particularly now with Wikidata, of the majority of scientific articles are not notable can provide a better framework for sustainable enough and should be left out? Is much of science scientific wiki work compared to separate wikis. simply not encyclopedia-worthy? The Open Ac- Surely, if applying Wikidata instead of a locally cess Reader is a project initiated in 2014 aiming to livhttps://meta.wikimedia.org/wiki/OpenAccessReader. liii”Nomination” of steroidogenesis article in Wikiversity to lvWikipedia:Identifying reliable sources, oldid = be used as reference section on “Wikipedia talk:WikiProject 596070076, English Wikipedia. Medicine”, 22:58, 15 April 2014, oldid 604356183. lvihttp://pdbwiki.org/

54 controlled wiki the scientist’s control of authorship cluded “[a]lthough the students participating in the is mostly lost but the task of system and wiki ad- study reported an increase in engagement when us- ministration is considerably reduced. It is unclear ing the iPad, there was not a corresponding rise in whether Wikidata sufficiently supports the many achievement.”505 Comparisons of the scores from varied data representations found in scientific wikis the Programme for International Student Assess- and whether external tools can sufficiently present ment (PISA) in Denmark with pupils grouped ac- the Wikidata data in the ways its presently done on cording to whether they had access to tablet com- specialized scientific wikis. Furthermore, will nota- puters in school or not showed a lower average score bility issues hinder the representation of scientific for those with tablet access.506 Such studies call data on Wikimedia sites? into question whether Wikipedia furthers educa- tion, and whether teachers should encourage stu- What can Wikipedia say about the dents to use Wikipedia at all. Given that at least world and can it affect the world? the spikes in Wikipedia page views are related to pop culture phenomenons, such as celebrity deaths To what extent can Wikipedia be used for for pre- and Super Bowl halftime entertainers,250 are stu- diction of world events such as movie boxoffice, elec- dents diverged away from learning ‘useful knowl- tion results, etc.? Does Wikipedia have a positive edge’ to reading pop culture when they engage with societal value (whatever that is) or is it merely a Wikipedia? Friday argument settling site? How appropriate is the two digit—or even four digit—value in mil- Structured data? liard US Dollars mentioned by Band and Gerafi252 for the consumer value or positive externalities? In January 2013 the Wikidata project was launched Does Wikipedia affect the world and to what and it grew quickly. In terms of pages it quickly sur- extent? The Chinese censorship of the Chinese passed the English Wikipedia. A few papers have Wikipedia500 and the Iranian censorship of some described the system in overview,384, 385 but oth- of the articles501 could indicate erwise very little research have been done on this that some authorities are afraid of the impact of young project. What is the authorship and read- Wikipedia information. To what extent does polit- ership of Wikidata? Are bots the dominating force ical Wikipedia articles affect voting? on Wikidata or do humans have a say? What types Noam Cohen attributed the Microsoft En- of data can humans contribute with? Do we see a carta demise to the competion from the free highly scewed editing pattern where a few prolific Wikipedia.502 The introduction of Encarta had bots make the majority of the work? earlier — in the 1990s — impacted the sale of the Will we see ‘ontology wars’ on Wikidata where competiting Encyclopædia Britannica: Encyclopæ- editors cannot agree on properties? One exam- dia Britannica hard copy book sales fell by 50% ple of property discussions is how humans/persons over a couple of years, affecting an annual revenue should be described. Initially, the property ‘GND- on several hundreds of millions US Dollars.503, 504 type’ (German National Library) stating ‘person’ Do students get better by using Wikipedia? Re- was used but later this property was deprecated for searchers have carried out a field study with a con- labeling persons as persons. trolled experiment using OLPC computers, that What about the multi-lingual nature of the Wiki- among the applications had an offline copy of data project? Is the regimentional description of Wikipedia. Among several hundreds Ethiopian items across languages ‘dangerous’? Does an on- children the researchers found that children with tology uniform across languages prohibits cultural no laptop got higher grades in mathematics. They diversity? Is the predominantly English discussions also found that school engagement was no different on Wikidata a problem for non-English users? Be- among the two groups with and without an OLPC fore the launch of Wikidata Mark Graham raised device. Could this indicate that children are put at concerns that the “highly significant and hugely im- a disadvantage by reading Wikipedia? This is not portant” changes brought by Wikidata “have wor- what the study says, because the researchers also rying connotations for the diversity of knowledge” examined abstract reasoning in the children and re- on Wikipedia. He believed that “[i]t is important ported that the results indicated that children with that different communities are able to create and laptops outperformed children without.416 Other reproduce different truths and worldviews,” exem- technologies have been promoted in learning, but plifying the problem with the population of : although heralded as a new age for learning, they Should it include occupied and contested territo- may not fulfill the promises to revolutionize educa- ries?507 Wikidata—at least partly—counters Gra- tion. A study on iPad reading in Year 6 pupils con- ham’s worry: each property (e.g., ‘population’)

55 may have multiple values, and qualifiers associated machine learning-based detection from vandalism with Wikidata values can distinguish between the features operate. Would automatic vandalism de- scope of each claim. Denny Vrandeˇci´c,project di- tection be possible on Wikidata? rector of Wikidata, explained: “We do not expect Can Wikidata describe everything? What kinds the editors to agree on the population of Israel, of data can Wikidata not conveniently describe? but we do expect them to agree on what specific Wikisource, and Wikimedia Commons sources claim about the population of Israel.”lvii got relatively unhindered their language links re- However, the question remains whether the Wiki- presented in Wikidata and according to the Wiki- data system provides sufficient flexibility to cap- data development plan for 2014+lviii , ture, e.g., the slight interlanguage differences in Wikinews, Wikibooks and Wikiversity are sched- some concepts, and when there are differences does ule for inclusion in Wikidata. However, Wiktionary the definition fall back to one centered in the anglo- was as of January 2014 not scheduled for Wiki- centric world-view? Take the Germanic concept data inclusion. It is unclear if the item/property of ‘Hochschule’/‘højskole’/‘h¨ogskola’. The English system of Wikidata is an appropriate representa- Wikipedia has a separate article about the Ger- tion for words and how lexemes, lemmas, forms and man(ic) concept of ‘Hochschule’, and Wikidata has senses should most easily be represented with the an item for the concept. However, that concept Wikidata data model. dangles in Wikidata, since the German Wikipedia How advanced queries can be made with Wiki- ‘Hochschule’ article leads to another Wikidata item data data? Magnus Manske’s AutoList tool can associated with the concept of a ‘higher education already now carry out on-the-fly queries like “all organization’. Another potential problem could be poets who lived in 1982”. But can such queries con- whether certain values for a property should exist tinue to be carried out effectively, and will Wikidata or not. Take the borders of Israel as an example: generally be able to scale? For example, how will The property ‘shares border with’ presently lists Wikidata cope with several thousand claims per , , Egypt and , and not the item? It may be worth to remember that Wikipedia . The international recognition articles seldomly reach past 200–300 kB because of the State of Palestine as a state varies between articles get split into subarticles, e.g., “Barack countries, so according to one view the State of Obama” splits into “Family of Barack Obama” and Palestine should be listed, while the opposing view “ Senate career of Barack Obama” etc. This would hold it should not. A suitable qualier (‘as sharding technique seems not to be readily possible recognized by’?) could possibly resolve it. Another with Wikidata. Dynamic Wikidata-based transla- issue arises for almost similar concepts which can tion in its present form, e.g., through the qLabel be linked by the informal interwiki Wikipedia links Javascript library, can result in multiple requests with no problem, but where the semantic descrip- to Wikidata servers for just a single page view on tion will be difficult in a multilingual environment. a third-party website. If successful, will Wikidata- Take the classic scientific article The magical num- based translation results in unmanageable load on ber seven plus or minus two: some limits on our Wikidata? capacity for processing information. In the English and French Wikipedias it has its own article, while the German Wikipedia chooses to make an article Acknowledgment on the psychological concept described in the arti- cle (“Millersche Zahl”): Whether this item has an Thanks to Daniel Kinzler, Torsten Zesch, Felipe author or not and a date of publication depends on Ortega, Piotr Konieczny, Claudia Koltzenburg and whether one regard it as a publication or a concept. James Heilman for pointing to references and tools A pure semantic approach would split such cases, Thanks also to Chitu Okoli, Mohamad Mehdi, increasing babelization. Mostafa Mesgari and Arto Lanam¨akiwith whom I have experienced a few instances of vandalism I am writing systematic reviews about Wikipedia on Wikidata: French footballer Anthony Martail research.14 called ‘Rien’ (French: nothing, actually a Norwe- gian lake) and American basketball player Michael Jordan called ‘insect’. ‘Called’ here means at- References tributed the GND type of ‘Rien’ or ‘insect’, — nei- [1] Jakob Voß. Measuring Wikipedia. In Proceedings ther standard GND types. Will vandalism be a International Conference of the International So- problem on Wikidata? On Wikipedia a bot with lviiihttps://www.wikidata.org/wiki/Wikidata:Develop- lviiSee comments to Mark Graham’s article. ment plan.

56 ciety for Scientometrics and Informetrics : 10th, [8] Chitu Okoli, Kira Schabram, and Bilal Abdul 2005. Kader. From the academy to the wiki: practical applications of scholarly research on Wikipedia. [2] S¨orenAuer, Christian Bizer, Georgi Kobilarov, Wikimania, 2009. Jens Lehmann, Richard Cyganiak, and Zachary Ives. DBpedia: A nucleus for a web of open data. Annotation: Short review of peer- In The Semantic Web, volume 4825 of Lecture review Wikipedia research. The re- Notes in Computer Science, pages 722–735, Hei- searchers identified over 400 academic delberg/Berlin, 2007. Springer. papers. A few of these 400 are briefly summarized. Annotation: Description of a sys- tem that extracts information from [9] Olena Medelyan, David Milne, Catherine Legg, the templates in Wikipedia, processes and Ian H. Witten. Mining meaning from and presents them in various ways. Wikipedia. International Journal of Human- Some of the methods and services Computer Studies, 67(9):716–754, September they use are MySQL, Virtuoso, Open- 2009. , GeoNames, Freebase, SPARQL [10] Nicolas Jullien. What we know about Wikipedia: and SNORQL. The system is available A review of the literature analyzing the from http://DBpedia.org project(s). ArXiv, May 2012. [3] Lee Rainer and Bill Tancer. Data memo. Report, Annotation: A review of Wikipedia Pew Research Center’s Internet & American Life research. Project, April 2007. [11] Finn Arup˚ Nielsen. Wikipedia research and tools: Annotation: Reports the result of a Review and comments. SSRN, February 2012 survey on Wikipedia use among Amer- 2012. ican adults. [12] Chitu Okoli. A brief review of studies of [4] Kathryn Zickuhr and Lee Rainie. Wikipedia, Wikipedia in peer-reviewed journals. In Digi- past and present. Report, Pew Research Center’s tal Society, 2009. ICDS ’09. Third International Internet & American Life Project, Washington, Conference on, pages 155–160. IEEE, 2009. D.C., January 2011. [13] Arto Lanam¨aki,Chitu Okoli, Mohamad Mehdi, and Mostafa Mesgari. Protocol for systematic Annotation: Presents the results of a mapping of Wikipedia studies. In Timo Leino, survey on Americans use of Wikipedia. editor, Proceedings of IRIS 2011, number 15 in It is based on telephone interviews TUCS Lecture Notes, pages 420–433. University conducted in the spring 2010. It is of Turku, October 2011. found that 53% of Internet users used Wikipedia, corresponding to 42% of [14] Chitu Okoli, Mohamad Mehdi, Mostafa Mes- ˚ adult Americans. gari, Finn Arup Nielsen, and Arto Lanam¨aki. The people’s encyclopedia under the gaze of the [5] Seth Schaafsma and Selina Kroesemeijer. Get- sages: A of scholarly research ting to know the grassroots. Technical report, on Wikipedia. Social Science Research Network, Vereniging Wikimedia Nederland, July 2013. October 2012. [6] Han-Teng Liao. Growth of academic interest in Annotation: Systematic review on Wikipedia from major Chinese-speaking regions: research on Wikipedia focusing on Has it peaked?. Internet, February 2012. peer-reviewed journal articles and the- ses published up until June 2011 and Annotation: Short blogpost plotting included some important conference the number of theses form major papers. The review covers all aspects Chines-speaking regions as a function of Wikipedia research, including uses of year. of Wikipedia-derived data. [7] Chitu Okoli and Kira Schabram. Protocol for [15] Mostafa Mesgari, Chitu Okoli, Mohamad Mehdi, a systematic literature review of research on Finn Arup˚ Nielsen, and Arto Lanam¨aki. “The Wikipedia. In Proceedings of the International sum of all human knowledge”: A systematic re- ACM Conference on Management of Emergent view of scholarly research on the content of Wik- Digital EcoSystems, New York, NY, USA, 2009. pedia. Journal of the Association for Information Association for Computing Machinery. Science and Technology, 66(2):219–245, 2015. Annotation: Short article that sets [16] Chitu Okoli, Mohamad Mehdi, Mostafa Mes- up a framework for systematic re- gari, Finn Arup˚ Nielsen, and Arto Lanam¨aki. view of Wikipedia research identifying Wikipedia in the eyes of its beholders: a sys- roughly 1,000 articles. tematic review of scholarly research on Wikipedia

57 readers and readership. Journal of the Asso- Annotation: Blog post that ‘explores ciation for Information Science and Technology, the complications of attribution and 65(12):2381–2403, 2014. identification in online research’.

Annotation: Systematic review on [25] Andrew Lih. : How a scientific research on Wikipedia and bunch of nobodies created the world’s greatest its readership encyclopedia. Hyperion, March 2009.

[17] Oliver Keyes. English Wikipedia pageviews by Annotation: Book on various as- second. figshare, April 2015. pect of Wikipedia and related phe- nomenons: Usenet, Nupedia, wikis, Annotation: Data set with page view bots, Seigenthaler incident, Essjay statistics from the English Wikipedia controvery, Microsoft Encarta, etc. collected in March and April 2015. [26] Phoebe Ayers, Charles Matthews, and Ben Yates. [18] Ludovic Denoyer and Patrick Gallinari. The How Wikipedia works: An how you can be a part Wikipedia XML corpus. ACM SIGIR Forum, of it. No Starch Press, September 2008. 40(1):64–69, June 2006. [27] Daniela J. Barrett. Mediawiki. O’Reilly, 2008. Annotation: Describes a dataset with Wikipedia text represented in [28] Yaron Koren. Working with MediaWiki. Wiki- XML. Works Press, November 2012. [19] Gordon M¨uller-Seitz and Guido Reger. Annotation: Book about MediaWiki ‘Wikipedia, the free encyclopedia’ as a role and Semantic MediaWiki. model? lessons for open innovation from an [29] Bo Leuf and Ward Cunningham. The wiki way: exploratory examination of the supposedly Quick collaboration on the web. Addison-Wesley, democratic-anarchic nature of Wikipedia. In- , April 2001. ternational Journal of Technology Management, 32(1):73–88, 2010. [30] Geert Lovink and Nathaniel Tkacz, editors. Crit- ical point of view: A Wikipedia reader, volume 7 [20] Roy Rosenzweig. Can history be open source? of INC Reader. Institute of Network Cultures, Wikipedia and the future of the past. Journal of , The Netherlands, 2011. American History, 93(1):117–146, June 2006. [31] John Broughton. Wikipedia: The missing man- Annotation: Discuss several aspects ual. O’Reilly Media, 2008. of history on the English Wikipedia and how professional historians should [32] Andrew Dalby. The world and Wikipedia: How regard that wiki. The author also we are editing reality. Siduri Books, 2009. make a quality assessment of a Ame- [33] Robert E. Cummings. Lazy virtues: Teaching rian history articles on Wikipedia and writing in the age of Wikipedia. Vanderbilt Uni- compare them against Encarta and versity Press, 2009. American National Biography Online. [34] Aaron Shaw, Amir E. Aharoni, Angelika Adam, [21] The ed17 and Tony1. Wikipedia’s traffic statis- Bence Damokos, Benjamin Mako Hill, Daniel Mi- tics understated by nearly one-third. Wikipedia etchen, Dario Taraborelli, Diederik van Liere, Signpost, September 2014. Evan Rosen, Heather Ford, Jodi Schneider, Giovanni Luca Ciampaglia, Lambiam, Nicolas Blog post on Wikipedia Annotation: Jullien, Oren Bochman, Phoebe Ayers, Piotr page view statistics discovered to be Konieczny, Adam Hyland, Sage Ross, Steven wrong. Walling, , and Tilman Bayer. Wiki- [22] Sameer Singh, Amarnag Subramanya, Fernando media Research Newsletter, volume 2. 2012. Pereira, and Andrew McCallum. Wikilinks: A Annotation: Aggregation of the large-scale cross-document coreference corpus la- Wikimedia Research Newsletter for beled via links to Wikipedia. Technical report, the year 2012 written by various re- University of , October 2012. searchers. [23] Sameer Singh, Amarnag Subramanya, Fernando Pereira, and Andrew McCallum. Large-scale [35] Rikke Frank Jørgensen. Making sense of the cross-document coreference using distributed in- German Wikipedia community. MedieKultur, ference and hierarchical models. In Human Lan- 28(53):101–117, 2012. guage Technologies. Association for Computa- Annotation: Describe the German tional Linguistics, 2011. Wikipedia based on qualitative in- [24] Heather Ford. Onymous, pseudonymous, neither terviews with seven members of the or both?. Ethnography Matters, June 2013. Berlin Wikipedia community.

58 [36] Patrick M. Archambault, Tom H. van de Belt, [43] Barry X. Miller, Karl Helicher, and Teresa Berry. Francisco J. Grajales III, Marjan J. Faber, I want my Wikipedia. Library Journal, 6, April Craig E. Kuziemsky, Susie Gagnon, Andrea 2006. Bilodeau, Simon Rioux, Willianne L.D.M. Nelen, Report on three experts Marie-Pierre Gagnon, Alexis F. Turgeon, Karine Annotation: informally reviewing popular culture, Aubin, Irving Gold, Julien Poitras, Gunther Ey- current affairs and science articles in senbach, Jan A.M. Kremer, and France L´egar´e. Wikipedia. Popular culture reviewer Wikis and collaborative writing applications in calls it “the people’s encyclopedia”; health care: A scoping review. Journal of Medi- current affairs reviewer “was pleased cal Internet Research, 15(10):e210, 2013. by Wikipedia’s objective presentation [37] and Helen Nissenbaum. of controversial subjects”, but cau- Commons-based peer production and virtue. The tioned “a healthy degree of skepticism Journal of Political Philosophy, 14(4):394–419, and skill at winnowing fact from opin- 2006. ion is required”. Lastly, the science reviewer found it difficult to charac- Discuss commons- Annotation: terize because of the variation, noting based peer production exemplified “flaws” and “the writing is not excep- with free and open source software, tional”, but “good content abounds”. SETI@home, NASA Clickworkers, Wikipedia and Slashdot. [44] Lara Devgan, Neil Powe, Brittony Blakey, and Martin Makary. Wiki-surgery? internal valid- [38] Piotr Konieczny. Wikipedia: community or so- ity of Wikipedia as a medical and surgical ref- cial movement. Interface, 1(2):212–232, Novem- erence. Journal of the American College of Sur- ber 2009. geons, 205(3, supplement):S76–S77, September [39] Fabian M. Suchanek and Gerhard Weikum. 2007. YAGO: a large ontology from Wikipedia and [45] John Gever. Wikipedia information on surgical WordNet. Journal of Web Semantics, 6(3):203– procedures generally accurate. DocGuide.com, 217, 2008. October 2007. [40] Andrew Lih. How Wikipedia solved the knowl- Annotation: Summary of the edge gap. TEDx-American University, April research by Lara Devgan et al. 2014. “Wiki-Surgery? Internal Validity of Annotation: Talk on Wikipedia at Wikipedia as a Medical and Surgical TEDx-American University. Reference”.

[41] Besiki Stvilia, Michael B. Twidale, Les Gasser, [46] Wikipedia schl¨agtBrockhaus. Stern, December and Linda C. Smith. Information quality dis- 2007. cussions in Wikipedia. In Suliman Hawamdeh, [47] K. C. Jones. German Wikipedia outranks tradi- editor, . Nurturing Cul- tional encyclopedia’s online version. Information- ture, Innovation, and Technology. Proceedings of Week, December 2007. the 2005 International Conference on Knowledge [48] Kevin A. Clauson, Hyla H. Polen, Maged N. Management, pages 101–113, , October Kamel, and Joan H. Dzenowagis. Scope, com- 2005. World Scientific. pleteness, and accuracy of drug information in Annotation: An analysis with re- Wikipedia. The Annals of Pharmatherapy, 42, spect to information quality of a December 2008. sample of the discussion pages on Annotation: Examines Wikipedia as Wikipedia. The analysis is based on a drug reference with a compar- the their own information quality as- ison against Medscape Drug Ref- sessment model, and they provide erence. Wikipedia has more omis- example quotations from the pages. sions but no factual errors among 80 They conclude that ‘the Wikipedia drug-related questions/answers, e.g., community takes issues of quality very for administration, contraindications seriously’. and issues around pregnancy/lacta- [42] Jim Giles. Internet encyclopaedias go head tion. They also find that Wikipedia to head. Nature, 438(7070):900–901, December has no dosage information, but this 2005. is not surprising given that Wikipedia explicit encourage authors not to add Annotation: Report of a comparison this information. Four factual errors of the accuracy in Wikipedia and En- were found in Medscape. Two were cyclopedia Britannica conflicting information in different

59 parts of the text while the remaining Annotation: Study on the quality two were due to lack of timely update. of information on Wikipedia about mental disorders in terms of accu- [49] Michael P. Pender, Kaye E. Lasserre, Lisa M. racy, up-to-dateness, coverage, refer- Kruesi, Christopher Del Mar, and Satyamurthy ences and readbility with compari- Anuradha. Putting Wikipedia to the test: a case son against 13 other websites. 10 top- study. In The Special Libraries Association An- ics were rated the three psychologists nual Conference, June 2008. with relevant expertise. Wikipedia was Annotation: A small blinded com- generally rated higher than the other parison of 3 Wikipedia articles for websites. medical student information against AccessMedicine, eMedicine and UpTo- [55] Stacey M. Lavsa, Shelby L. Corman, Colleen M. Date online resources. Wikipedia was Culley, and Tara L. Pummer. Reliability of found to be unsuitable for medical stu- Wikipedia as a medication information source dents. for pharmacy students. Currents in Pharmacy Teaching and Learning, 3(2):154–158, April 2011. [50] Michael P. Pender, Kaye E. Lasserre, Christopher Del Mar, Lisa Kruesi, and Satyamurthy Anu- [56] T. Aldairy, S. Laverick, and G. T. McIntyre. Or- radha. Is Wikipedia unsuitable as a clinical in- thognathic surgery: is patient information on the formation resource for medical students? Medical Internet valid. European Journal of Orthodontics, Teacher, 31:1094–1098, 2009. 34(4):466–469, August 2012. Annotation: The same study as [57] Garry R. Thomas, Lawson Eng, Jacob F. “Putting Wikipedia to the test: a case de Wolff, and Samir C. Grover. An evaluation study”. of Wikipedia as a resource for patient education in nephrology. Seminar in Dialysis, 2013. [51] Malolan S. Rajagopalan, Vineet K. Khanna, Yaa- cov Leiter, Meghan Stott, Timothy N. Showal- Annotation: An analysis of the qual- ter, Adam P. Dicker, and Yaacov R. Lawrence. ity of Wikipedia articles in the ne- Patient-oriented cancer information on the inter- uphrology area. They looked at com- net: a comparison of Wikipedia and a profession- prehensiveness, reliability and read- ally maintained database. Journal of ability. Practice, 7(5):319–323, September 2011. [58] Olavi Koistinen. World’s largest study on [52] M. S. Rajagopalan, V. Khanna, M. Scott, Wikipedia: Better than its reputation. Helsinski Y. Leiter, T. N. Showalter, A. Dicker, and Y. R. Times, December 2013. Lawrence. Accuracy of cancer information on the internet: A comparison of a wiki with a profes- Annotation: News articles about an sionally maintained database. Journal of Clinical expert evaluation of 134 articles on Oncology, 28:7s, 2010. Supplement abstract 6058. the Finish Wikipedia on the dimen- sions lack of errors, coverage and bal- Annotation: A short abstract report- ing that Wikipedia had similar accu- ance, sourcing, topicality, neutrality racy and depth compared to the pro- and clarity. fessionally edited information in the [59] Robert T. Hasty, Ryan C. Garbalosa, Vincenzo A. National Cancer Institute’s Physician Barbato, Pedro J. Jr Valdes, David W. Powers, Data Query (PDQ), but Wikipedia Emmanuel Hernandez, Jones S. John, Gabriel was less readable as evaluated with the Suciu, Farheen Qureshi, Matei Popa-Radu, Ser- Flesch–Kincaid readability test. gio San Jose, Nathaniel Drexler, Rohan Patankar, [53] Andreas Leithner, Werner Maurer-Ertl, Mathias Jose R. Paz, Christopher W. King, Hilary N. Ger- Glehr, Joerg Friesenbichler, Katharina Leithner, ber, Michael G. Valladares, and Alyaz A. Somji. and Reinhard Windhager. Wikipedia and os- Wikipedia vs peer-reviewed medical literature for teosarcoma: a trustworthy patients’ information. information about the 10 most costly medical Journal of the American Medical Informatics As- conditions. The Journal of the American Osteo- sociation, 17(4):373–374, July-August 2010. pathic Association, 114(5):368–373, May 2014.

[54] N. J. Reavley, A. J. Mackinnon, A. J. Morgan Annotation: A study with compari- AJ, M. Alvarez-Jimenez, S. E. Hetrick SE, E. Kil- son of assertions in Wikimedia articles lackey, B. Nelson, R. Purcell, M. B. Yap, and on medical conditions with assertions A. F. Jorm. Quality of information sources about in the peer-reviewed literature. mental disorders: a comparison of Wikipedia with centrally controlled web and printed sources. Psy- [60] Lori Fitterling. Wikipedia: Proceed with cau- chological medicine, 42(8):1753–1762, December tion. The Journal of the American Osteopathic 2011. Association, 114(5):334–335, 2014.

60 Annotation: Editorial about study [68] Thomas Chesney. An empirical examination of on Wikipedia quality for medicinal Wikipedia’s credibility. First Monday, 11(11), conditions. 2006.

[61] Joy H. Fraser and Norman Temple. Wikipedia vs Annotation: A study where 258 re- peer-reviewed medical literature for information searcher were asked to evaluate a about the 10 most costly medical conditions-I. Wikipedia article. 69 researcher re- The Journal of the American Osteopathic Associ- sponded and evaluated either an ar- ation, 114(10):761, October 2014. ticle within their own field of research [62] Jonathan Leo and Jeffrey R. Lacasse. Wikipedia or a random article. The evaluation fo- vs peer-reviewed medical literature for informa- cused on credibility. tion about the 10 most costly medical conditions- [69] Wikipedia schl¨agtdie Profis. Spiegel Online, De- II. The Journal of the American Osteopathic As- cember 2007. sociation, 114(10):761–764, October 2014. [70] Torsten Kleinz. Reputation und Wikipedia: von [63] George S. Chen and Yi Xiong. Wikipedia vs peer- Einfluss, Anerkennung und Vergleichen. heise on- reviewed medical literature for information about line, December 2007. the 10 most costly medical conditions-III. The Journal of the American Osteopathic Association, [71] Lars Henrik Aagaard. Hvem er bedst?. Berlingske 114(10):764–765, October 2014. Tidende, July 2009.

[64] Thomas J. Hwang, Florence T. Bourgeois, and Annotation: Danish newspaper ar- John D. Seeger. Drug safety in the digital age. ticle comparing eight articles in the The New England Journal of Medicine, 370:2460– Danish Wikipedia with eight articles 2462, 2014. in Den Store Danske encyclopedia, Annotation: Analysis of 22 with Wikipedia performing slightly Wikipedia articles on prescription better. drugs and whether they are updated [72] Christina R. Vargas, Joseph A. Ricci, Danielle J. in relation to new drug-safety com- Chuang, and Bernard T. Lee. Online patient re- munications from the U.S. Food and sources for liposuction: A comparative analysis of Drug Administration. 36% articles readability. Annals of Plastic Surgery, February remained unchanged after more than 2015. a year. [73] Henning Bergenholtz. Hvilke krav skal en e- [65] Jona Kr¨aenbring, Tika Monzon Penza, Joanna ordbog opfylde for at kunne anvendes som brug- Gutmann, Susanne Muehlich, Oliver Zolk, Leszek bart værktøj?. DF Revy, 31(5):12–14, 2008. Wojnowski, Renke Maas, Stefan Engelhardt, and Antonio Sarikas. Accuracy and completeness of Annotation: Discussion of electronic drug information in Wikipedia: a comparison reference works. with standard textbooks of pharmacology. PLOS ONE, September 2014. [74] Christian M. Meyer and Iryna Gurevych. Worth its weight in gold or yet another resource — Annotation: Analysis of the quality a comparative study of Wiktionary, OpenThe- of drug information in the German and saurus and GermaNet. In A. Gelbukh, editor, CI- English Wikipedias. Accuracy, com- CLing, volume 6008 of Lecture Notes in Computer pleteness, readability, number of refer- Science, pages 38–49, Berlin-Heidelberg, 2010. ences, revisions and editors were con- Springer-Verlag. sidered. Annotation: A comparative study of [66] Behdin Nowrouzi, Basem Gohar, Camille Smith, German Wiktionary, OpenThesaurus Behnam Nowrouzi-Kia, Rubaiya Khan, Alicia and GermaNet. McDougall, Martyna Garbaczewska, Shalini Si- vathasan, Keith Brewster, and Lorraine Carter. [75] Patrick O’Connor. Maurice Jarre. Guardian, An examination of health, medical and nutri- March 2009. tional information on the internet: a comparative [76] Siobhain Butterworth. The readers’ editor on ... study of wikipedia, webmd and the mayo clinic web hoaxes and the pitfalls of quick journalism. websites. The International Journal of Commu- , May 2009. Accessed 4 May 2009. nication and Health, (6), 2015. [77] ‘Monkey’. ‘Wikipedia vandals’ strike again [67] Christina R. Vargas, Neelesh A. Kantak, in Norman Wisdom obits. Media Monkey, Danielle J. Chuang, Pieter G. Koolen, and Guardian, October 2010. Bernard T. Lee. Assessment of online patient materials for breast reconstruction. Journal of [78] Stephen Caddick. Wiki and other ways to share Surgical Research, May 2015. learning online. Nature, 442:744, August 2006.

61 [79] Udo Altmann. Representation of medical infor- and Wikipedia of twentieth century matics in the Wikipedia and its perspectives. In philosophers. Wikipedia had by far R. Engelbrecht et al., editors, Connecting Medi- the most philosophers, and Wikipedia cal Informatics and Bio-Informatics, volume 116 had also a large number of more re- of Studies in Health Technology and Informatics, cently born philosophers. Besides this pages 755–760. IOS Press, 2005. the three encyclopedia did not dif- [80] Finn Arup˚ Nielsen. Scientific citations in fer greatly in coverage with respect Wikipedia. First Monday, 12(8), August 2007. to gender, disciplines and national- ity, though Wikipedia had a smaller Annotation: Statistics on the fraction of German and French but outbound scientific citation from a higher fraction of other Europeans, Wikipedia with good correlation to and Wikipedia has also categorizes the Journal Citation Reports from some in the spiritual disciplines as Thomson Scientific. philosophers. ˚ [81] Finn Arup Nielsen. Clustering of scientific cita- [84] Kasia Czarnecka-Kujawa, Rupert Abdalian, and tions in Wikipedia. In Wikimania, 2008. Samir C. Grover. The quality of open access and Annotation: Study of the citations open source internet material in gastroenterology: in Wikipedia to scientific journals. is Wikipedia appropriate for knowledge transfer Wikipedia articles and journals are bi- to patients. Gastroenterology, 134(4, Supplement clustered and a longitudinal analysis of 1):A–325–A–326, April 2008. the number of citations is performed. Annotation: Study on the compre- [82] Alexander Halavaisa and Derek Lackaff. An anal- hensiveness of Wikipedia in ICD-9 ysis of topical coverage of Wikipedia. Journal of and ICD-10 diagnostic codes in gas- Computer-Mediated Communication, 13(2):429– troenterology. Over 80% were covered. 440, January 2008. Readability is also considered.

Annotation: Examination of the cov- [85] George Bragues. Wiki-philosophizing in a mar- erage of Wikipedia by two methods: ketplace of ideas: evaluating Wikipedia’s entries First method entails sampling 3000 on seven great minds. MediaTropes eJournal, Wikipedia articles and manually cat- 2(1):117–158, 2009. egorizing them against the Library of Congress category system, then com- [86] Jeff Friedlin and Clement J. McDonald. An paring them with the number of pub- evaluation of medical knowledge contained in lished books within the categories. Wikipedia and its use in the LOINC database. This method shows that, e.g., the lit- Journal of the American Medical Informatics As- erature category is underrepresented sociation, 17(3):283–287, May 2010. in Wikipedia, explained by the lack of Annotation: Describes a system for fiction in Wikipedia. Other categories finding articles on Wikipedia that underrepresented are, e.g., medicine, maps to parts in the LOINC database. philosophy and law, whereas science and some areas with systematice mass- [87] Adam R. Brown. Wikipedia as a data source inserted articles, such as naval and for political scientists: Accuracy and complete- geography, are well represented. Also ness of coverage. PS: Political Science & Politics, fans tend to add, e.g., popular mu- 44(2):339–343, April 2011. sic groups and literature description articles. The length of medicine and [88] Joseph Michael Reagle, Jr. and Lauren Rhue. law article are large compared to the Gender bias in Wikipedia and Britannica. In- ternational Journal of Communication, 5:1138– rest of the articles. The second method 1158, 2011. matched Wikipedia articles with spe- cialized encyclopedia in physics, lin- Annotation: Study on the possible guistic and poetry and found missing gender bias in the the content of articles in Wikipedia. Wikipedia with examination of several [83] Beate Elvebakk. Philosophy democratized? a thousand biographic articles in com- comparison between Wikipedia and two other parison to Britannica and other re- Web-based philosophy resources. First Monday, sources. 13(2), November 2008. [89] Anna Samoilenko and Taha Yasseri. The dis- Annotation: A comparison of Stan- torted mirror of Wikipedia: a quantitative analy- ford Encyclopedia of Philosophy, In- sis of Wikipedia coverage of academics. EPJ Data ternet Encyclopedia of Philosophy Science, 3:1, 2013.

62 Annotation: Analysis of the coverage of conflict page labeling the most con- of scientists on Wikipedia. Wikipedia tentious topics are found. presence is compared with data from [95] Aniket Kittur, Bongwon Suh, Bryan A. Pendle- . ton, and Ed H. Chi. He says, she says: Conflict [90] PBS. 10 questions for Wikipedia founder Jimmy and coordination in Wikipedia. In Proceedings of Wales. PNS NewsHour: The Rundown, July the SIGCHI conference on Human factors in com- 2012. puting systems, pages 453–462, New York, NY, USA, April 2007. ACM. Annotation: Show news article with interview of Jimmy Wales. [96] Torie Bosch. How Kate Middleton’s wedding gown demonstrates Wikipedia’s woman problem. [91] Pamela T. Johnson, Jennifer K. Chen, John Eng, Slate, July 2012. Martin A. Makary, and Elliot K. Fishman. A [97] Stine Eckert and Linda Steiner. (re)triggering comparison of World Wide Web resources for backlash: responses to news of wikipedia’s gen- identifying medical information. Academic Ra- der gap. Journal of Communication Inquiry, diology, 15(9):1165–1172, September 2008. 37(4):284–303, 2013. Annotation: A study with 135 med- Annotation: Analysis of how the ical students answering clinical ques- gender gap of Wikipedia contributors tions with the help of Web re- was represented and discussed in me- source. Google and other search en- dia. gine were most effective. Sites such as Wikipedia, eMedicine and MDConsult [98] Gregory Kohs. Wikipedia biographies favor men. less, though they sometimes provided examiner.com, January 2011. the ultimate answer, and Wikipedia Annotation: Small report on the pro- did that the most. portion of males in Wikipedia biogra- [92] Cindy Royal and Deepina Kapila. What’s on phies about living people. Wikipedia, and what’s not...? assessing complete- [99] Max Klein. Sex ratios in Wikidata, Wikipedias, ness of information. Social Science Computer Re- and VIAF. hangingtogether.org, May 2013. view, 27(1):138–148, February 2009. Annotation: Report of an analysis of Annotation: Examines the length of the gender ratio on Wikipedias by us- sets of Wikipedia articles and com- ing Wikidata material. pares them with a number of other variables: With year, Encyclopedia [100] Max Klein. Preliminary results from WIGI, the Britanica, country population and Wikipedia Gender Inequality Index. Notconfus- company revenue. ing.com, January 2015. [93] Bill Wedemeyer. The quality of scientific articles Annotation: Blog post reporting a on the English Wikipedia. Video, 2008. Presen- study with Piotr Konieczny on gender tation at Wikimania in , Egypt. representation in Wikipedia. Annotation: A study on a range of [101] Claudia Wagner, David Garcia, Mohsen Jadidi, quality of scientific articles on the En- and Markus Strohmaier. It’s a man’s Wikipedia? glish Wikipedia along a number of di- assessing gender inequality in an online encyclo- mensions, e.g., coverage, referencing, pedia. In International AAAI Conference on We- length, user perception. blogs and Social Media, North America, 2015. [94] Aniket Kittur, Ed H. Chi, and Bongwon Suh. [102] Magnus Manske. Red vs. blue. The Whelming, What’s in Wikipedia? mapping topics and con- January 2015. flicts using socially annotated category structure. Annotation: Blog post with an anal- In CHI 2009, pages 1509–1512, New York, NY, ysis of the representation of gender in USA, 2009. ACM. Wikidata. Annotation: Describes an algorithm [103] Magnus Manske. Sex and artists. The Whelming, for determine the topic distribution of March 2015. a Wikipedia article over top-level cat- Annotation: Blog post with a egories based on the Wikipedia cate- Wikidata-based analysis of gender bias gory graph, and evaluate it against hu- in Wikipedias. man labeled data. By applying the al- gorithm to the 2006 and 2008 data sets [104] Han-Teng Liao. preelminary results from the of the English Wikipedia the topic cov- Wikipedia Gender Inequality Index project - com- erage over time is descriped, and with ments welcome. Wiki-research-l mailing list, Jan- combining the method with a degree uary 2015.

63 Annotation: Annotation: A study examining the inbound citations to Wikipedia from [105] Camilo Mora, Derek P. Tittensor, Sina Adl, Alas- press articles published in 2003 and tair G. B. Simpson, and Boris Worm. How many parts of 2004. It displays “diversity” species are there on earth and in the ocean?. PLoS (number of individual editors of an ar- Biology, 9(8):e1001127, August 2011. ticle) against “rigor” (number of edits Annotation: Estimates the number for an article) in some Wikipedia arti- of species. cles and the study regard these as indi- cators of quality. World War II, Islam [106] American Chemical Society. CAS REG- and Astronomy scored high on these ISTRY(sm) keeps pace with rapid growth of dimensions. The indicators of article chemical research, registers 60 millionth sub- quality increased after press citation stance. Chemical Abstracts Service Media Re- for some Wikipedia articles. leases, May 2011. [112] William Emigh and Susan C. Herring. Collabo- Annotation: Press release stating rative authoring on the web: A genre analysis of that the chemical database online encyclopedias. In Proceedings of the Pro- has reached its 60 millionth entry. ceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS’05), page [107] Tønnes Bekker-Nielsen. Historie p˚aWikipedia. 99.1, Washington, DC, USA, 2005. IEEE Com- Noter, 188:48–52, March 2011. puter Society. Annotation: Describes the quality Annotation: Reports on a genre of history articles on the Danish analysis of Wikipedia compared with Wikipedia. A university teacher mon- Everything2 and Columbia Encyclo- itored articles created by his students pedia. A quantitative analysis of the as they were edited by other Wikipedi- formality of the text is performed by ans. counting words related to formality or [108] Jon Fine. Sad news about Tim Russert broken informality. They find that the style by Wikipedia?. Fine on Media, June 2008. Blog of Wikipedia articles is close to that on BusinessWeek. of Columbia Encyclopedia.

Annotation: Short comment on [113] Malcolm Clark, Ian Ruthven, and Patrik O’Brian Wikipedia publishing news before it Holt. The evolution of genre in Wikipedia. Jour- becomes published by general media. nal for Language Technology and Computational Linguistics, 24(1):1–22, 2009. [109] Geoffrey Bilder. Many metrics. such data. wow.. CrossTech, 2014. Annotation: A non-quantitative, de- scriptive study on a few Wikipedia ar- [110] Donna Shaw. Wikipedia in the newsroom. Amer- ticles in terms of structural form (i.e., ican Journalism Review, 30(1):40–45, February/- genre) and its evolution as the articles March 2008. are extended. Annotation: Describes by a few ex- [114] Rui Lopes and Lu´ısCarri¸co.On the credibility of amples how Wikipedia is used as a Wikipedia: an accessibility perspective. In Sec- source in news media and other do- ond Workshop on Information Credibility on the mains. The article notes that editor at Web (WICOW 2008), New York, 2008. ACM. Philadelphia Inquirer warned journal- ist never to use Wikipedia ”to verify Annotation: A quantitative test of facts or to augment information in a conformity with the W3C’s Web Con- story”. It mentions the WikiScanner tent Accessibility Guidelines of 100 and the case of Seigenthaler. A number Wikipedia articles and 265 articles on of different people in the news busi- the Web referenced by Wikipedia. The ness are interviewed expressing differ- 100 Wikipedia articles had on average ent opinion on the use of Wikipedia a better score of accessibility. in news media, and studies by Nature [115] Wendy Chisholm, Gregg Vanderheiden, and Ian and Roy Rosenzweig are described Jacobs. Web content accessibility guidelines 1.0. briefly. W3c recommendation, The World Wide Web [111] Andrew Lih. Wikipedia as participatory journal- Consortium, March 1999. ism: Reliable sources? metrics for evaluating col- [116] S. B. Sinha and Harjit Singh Bedi. Commr. Of laborative media as a news resource. In 5th Inter- Customs, Bangalore vs. m/s ACER India Pvt. national Symposium on Online Journalism. The Ltd. The Judgement Information System, De- University of Texas at Austin, April 2004. cember 2007. Appeal (civil) 2321 of 2007.

64 Annotation: A case in the Supreme Annotation: News article about a Court of India where Wikipedia is new submarine cabel for communi- used for the definition of ‘laptop’. cation for the poorly connected east Africa. [117] Noam Cohen. Courts turn to Wikipedia, but se- lectively. , January 2007. [131] Lennart Guldbrandsson. Swedish Wikipedia sur- [118] Isaac Arnsdorf. Judge uses wiki in case. Yale passes 1 million articles with aid of article cre- Daily News, October 2008. ation bot. Wikimedia blog, June 2013. [119] Cass R. Sunstein. A brave new wikiworld. Wash- Annotation: Blog article reporting ington Post, February 2007. about the work of a computer pro- [120] Diane Murley. In defense of Wikipedia. Law Li- gram, a bot, creating many articles on brary Journal, 100(3):593–599, 2008. species on the Swedish Wikipedia. Annotation: Describes Wikipedia as [132] Duncan J. Watts and Steven H. Strogatz. Collec- a research tool for students and tive dynamics of ‘small-world’ networks. Nature, lawyers. 393(6684):397–498, June 1998.

[121] Jeff Breinholt. The wikipediazation of the Amer- Annotation: Describes so-called ican judiciary. The NEFA Foundation, January ‘small-world’ networks which can be 2008. constructed as interpolations between [122] Morgan Michelle Stoddard. Judicial citation to regular (lattice) and random net- Wikipedia in published federal court opinions. works. They exhibits short averages Master’s , School of Information and Li- paths while still being clustered. brary Science of the University of North Carolina Examples are given by film actors, at Chapel Hill, April 2009. power grid and the neural network of a worm together with dynamic [123] Lee F. Peoples. The citation of Wikipedia in ju- networks, e.g, multiplayer prisoner’s dicial opinions. SSRN, 2009–2010. dilemma. [124] Jens Ramskov. Wikipedia-grundlægger: Wiki- opdateringer m˚aikke blive en gammelmands- [133] Albert-L´aszlo-Barab´asiand R´eka Albert. Emer- hobby. Ingeniøren, December 2013. gence of scaling in random networks. Science, 286:509–512, October 1999. Annotation: Danish article with an interview of Jimmy Wales as the [134] Gabriel Pinski and Francis Narin. Citation in- founder of Wikipedia. fluence for journal aggregates of scientific publi- cations: Theory with application to literature of [125] Morten Rask. The reach and richness of physics. Information Processing & Management, Wikipedia: Is wikinomics only for rich countries. 12:297–312, 1976. First Monday, 13(6), June 2008. [135] Lawrence Page, Sergey Brin, Rajeev Motwani, [126] Brooke Gladstone. 10 years of Wikipedia. On and Terry Winograd. The PageRank citation The Media, January 2011. ranking: Bringing order to the web. Technical Re- Annotation: Interview with Execu- port 1999-66, Stanford InfoLab, November 1999. tive Director of the Wikimedia Foun- [136] Jon M. Kleinberg. Authoritative sources in a dation Sue Gardner. hyperlinked environment. Journal of the ACM, [127] J. P. Shim and Jeongwon Yang. Why is Wikipedia 46(5):604–632, September 1999. not more widely accepted in and ? [137] Francesco Bellomi and Roberto Bonato. Network factors affecting knowledge-sharing adoption. De- analysis of Wikipedia. In Proceedings of Wikima- cision Line, 40(2):12–15, March 2009. nia 2005 — The First International Wikimedia [128] Tamim Elyan. Arabic content in Wikipedia very Conference, 2005. weak, users to blame. Daily News Egypt, Septem- ber 2008. Assessed 2010-07-28. Annotation: Describes results of ap- plication of PageRank and Klein- [129] Noam Cohen. A Wikipedian challenge: Convinc- berg’s HITS algorithm on the English ing Arabic speakers to write in Arabic. The New Wikipedia corpus. “United States” York Times, July 2008. scored highest in both. Entries related Annotation: A news article on the to religion scored high PageRank. issue of why the Arabic Wikipedia is [138] David Laniado and Riccardo Tasso. Co- relatively small. authorship 2.0: patterns of collaboration in [130] Xan Rice. Internet: Last piece of fibre-optic jig- Wikipedia. In Proceedings of the 22nd ACM saw falls into place as cable links east Africa to conference on Hypertext and Hypermedia. ACM, grid. guardian.co.uk, August 2008. June 2011.

65 Annotation: Network analysis of Annotation: Analysis of various as- the coauthorship network in English pects of early Wikipedia, especially Wikipedia, where the main contribu- the number of articles and categories. tors to a Wikipedia article is identi- fied with a method of Adler et al. and [144] A. Capocci, F. Rao, and G. Caldarelli. Taxon- various network metrics are extracted omy and clustering in collaborative systems: The across time and topics. case of the on-line encyclopedia Wikipedia. EPL, 81(2):28006, January 2008. ˇ [139] V. Zlati´c,M. Boˇziˇcevi´c,H. Stefanˇci´c,and M. Do- [145] David M. D. Smith, Jukka-Pekka Onnela, and mazet. Wikipedias: Collaborative web-based en- Neil F. Johnson. Accelerating networks. New cyclopedias as complex networks. Physical Review Journal of Physics, 9(181), 2007. E, 74(1):016115, July 2006. Annotation: Discuss accelerating Annotation: Network analysis of in- networks with network analysis of the trawiki links on a number of different number of nodes and links, exemplify- language versions of Wikipedia and re- ing it on three different language ver- port a range of network characteris- sions of Wikipedia. tics. [146] Wray Buntine. Static ranking of web pages, and [140] Torsten Zesch and Iryna Gurevych. Analysis of related ideas. In Michel Beigbeder and Wai Gen the Wikipedia category graph for NLP applica- Yee, editors, Open Source Web Information Re- tions. In TextGraphs-2: Graph-Based Algorithms trieval, pages 23–26, October 2005. for Natural Language Processing. Proceedings of the Workshop, pages 1–8, New Brunswick, New Annotation: A multivariate analysis Jersey, USA, April 2007. The Association for of the wikilink graph of the English Computational Linguistics. 2005 Wikipedia with around 500’000 pages with a discrete version of the Annotation: Description of the use of hubs and authority algorithm. the Wikipedia category graph for de- termining semantic relatedness. Sev- [147] A. O. Zhirov, O. V. Zhirov, and D. L. Shepelyan- eral different similarity and distance sky. Two-dimensional ranking of Wikipedia arti- measures on the graph are examined cles. arXiv, September 2010. on several human labeled datasets Annotation: Examines the PageR- from the German Wikipedia. Graph ank and the ‘reverse’ PageRank characteristics (average shortest path, (CheiRank) for links within Wikipedia cluster coefficient and power law expo- as well as a combined rank they call nent) are also shown. 2DRank.

[141] David Laniado, Riccardo Tasso, Yana Volkovich, [148] Young-Ho Eom, Klaus M. Frahm, Andr´as and Andreas Kaltenbrunner. When the Wikipedi- Bencz´ur,and Dima L. Shepelyansky. Time evolu- ans talk: network and tree structure of wikipedia tion of wikipedia network ranking. The European discussion pages. In Proceedings of the Fifth In- Physical Journal B, 86(12):492, December 2013. ternational AAAI Conference on Weblogs and So- cial Media. AAAI, July 2011. Annotation: Analysis of intrawiki links on Wikipedia across several Annotation: Network analysis of the years. user interaction on the article and user talk pages of the English Wikipedia. [149] Pablo Aragon, David Laniado, Andreas Kaltenbrunner, and Yana Volkovich. Bio- [142] A. Capocci, V. D. P. Servedio, F. Colaiori, L. S. graphical social networks on Wikipedia: a Buriol, D. Donato, S. Leonardi, and G. Caldarelli. cross-cultural study of links that made history. Preferential attachment in the growth of social In 8th International Symposium on Wikis and networks: The internet encyclopedia Wikipedia. Open Collaboration, 2012. Physical Review E, 74:036116, September 2006. Annotation: Network analysis of Annotation: Network analysis of the bibliographic pages across the 15 the intrawiki links on the English largest Wikipedias. Wikipedia. [150] Sofia J. Athenikos and Xia Lin. The WikiPhil [143] Todd Holloway, Miran Boˇziˇcevi´c, and Katy Portal: visualizing meaningful philosophical con- B¨orner. Analyzing and visualizing the semantic nections. Journal of the Colloquium on coverage of Wikipedia and its authors. Complex- and Computer Science, 1(1), ity, 12(3):30–40, January 2007. 2009.

66 Annotation: Network analysis and Annotation: Analysis of geotagged visualization of the link network Wikipedia articles about topics in the among Western philsophers described with Middle East and North Africa on Wikipedia. (MENA) region and established the co-editing social network. By examin- ˇ [151] Radim Reh˚uˇrek. Fast and faster: a compari- ing the user page they tried to deter- son of two streamed matrix decomposition algo- mine if the user was from a specific rithms. In NIPS Workshop on Low-Rank Methods country. They found that most edi- for Large-Scale Machine Learning, 2010. tors were from the West rather than Annotation: Describes algorithms MENA countries. for large-scale eigen decomposition [157] Takashi Iba, Keiichi Nemoto, Bernd Peters, and and applies them on a large document- Peter A. Gloor. Analyzing the creative editing term matrix constructed from the En- behavior of Wikipedia editors: through dynamic glish Wikipedia. social network analysis. Procedia - Social and Be- [152] Choochart Haruechaiyasak and Chaianun Dam- havioral Sciences, 2(4):6441–6456, 2010. rongrat. Article recommendation based on topic Annotation: Describes a social net- model for Wikipedia selection for schools. In Dig- work analysis by looking at coauthor ital Libraries: Universal and Ubiquitous Access graph in Wikipedia from sequential to Information, volume 5362 of Lecture Notes in editing. Computer Science, pages 339–342, Berlin, 2008. Springer. [158] Brian Keegan, Darren Gergle, and Noshir Con- tractor. Staying in the loop: structure and dy- Annotation: Brief article describing namics of Wikipedia’s breaking news collabora- topic mining with latent Dirichlet al- tions. In WikiSym 2012, 2012. location using LingPipe on Wikipedia [159] Finn Arup˚ Nielsen. Sequential collaboration net- text as available in the Wikipedia for work with sentiment coloring. In NetSci, 2013. Schools dataset. Abstract. [153] Robert P. Biuk-Aghai. Visualizing co-authorship Annotation: Short description of networks in online Wikipedia. In International a network visualization of edits in Symposium on Communications and Information Wikipedia including sentiment analy- Technologies, 2006. ISCIT ’06, pages 737–742, sis of the edits. 2006. [160] Finn Arup˚ Nielsen, Michael Etter, and Lars Kai Annotation: Visualization of rela- Hansen. Real-time monitoring of sentiment in tionships between Wikipedia pages business related Wikipedia articles. April 2013. based on co-authorship patterns. Submitted. [154] Finn Arup˚ Nielsen. Wiki(pedia) and neuroinfor- Annotation: Description of an on- matics. Workshop on Wikipedia Research 2006, line service with sentiment analy- August 2006. sis of Wikipedia edits for monitor- ing business-related Wikipedia articles Annotation: Slides about neuroin- and display of the results with network formatics and data clustering of visualizations. Wikipedia content. [161] Alex Halavais. The Isuzu experiment. a thau- [155] Rut Jesus, Martin Schwartz, and Sune Lehmann. maturgical compendium, August 2004. Blog. Bipartite networks of Wikipedia’s articles and au- [162] How authoritative is Wikipedia. Dispatches from thors: a meso-level approach. In Proceedings of the Frozen North, September 2004. Blog. the 5th International Symposium on Wikis and Open Collaboration, page 5. Association for Com- [163] P. D. Magnus. Fibs in the Wikipedia, December puting Machinery, 2009. 2007. [164] P. D. Magnus. Early response to false claims in Annotation: Study with an applica- Wikipedia. First Monday, 13(9), September 2008. tion of an algorithm for identification of maximal bicliques on the network Annotation: Report on a study that of articles and editors on the English injected false fact in Wikipedia. Of 36 Wikipedia. errors only 15 was removed within 48 hours. [156] Bernie Hogan, Mark Graham, and Ahmed Med- hat Mohamed. The vocal minority: local self- [165] Jens Lenler. Verdens største leksikon: . . . og representation and co-editing on Wikipedia in the største fejlrisiko. Politiken, page 1, December Middle East and North Africa. 2012. 2006. Kultur section.

67 Annotation: Danish newspaper ar- [174] Michael Lorenzen. Vandals, administrators, and ticle about the Æblerød Wikipedia sockpuppets, oh my! an ethnographic study of hoax. Wikipedia’s handling of problem behavior. MLA forum, 5(2), December 2006. [166] . A false Wikipedia ‘biography’. USATODAY.com, 2005. [175] Luciana S. Buriol, Carlos Castillo, Debora Do- [167] Bertrand Meyer. Defense and illustration of nato, Stefano Leonardi, and Stefano Millozzi. Wikipedia. EiffelWorld Column, January 2006. Temporal analysis of the wikigraph. In Pro- ceedings of the 2006 IEEE/WIC/ACM Interna- Annotation: A generally positive tional Conference on Web Intelligence, pages 45– comment on Wikipedia with some 51, Washington, DC, USA, 2006. IEEE Computer anecdotal evidence, especially about Society. the vandalism on the authors bibliog- raphy. [176] Jacobi Carter. ClueBot and , 2007. [168] Kevin Morris. After a half-decade, massive Wikipedia hoax finally exposed. Daily Dot, Jan- [177] Martin Potthast, Benno Stein, and Robert uary 2013. Gerling. Automatic vandalism detection in Wikipedia. In Advances in Information Retrieval, Annotation: News report on a volume 4956 of Lecture Notes in Computer Sci- Wikipedia hoax. ence, pages 663–668, Berlin/Heidelberg, 2008. [169] Yoni Appelbaum. How the professor who fooled Springer. wikipedia got caught by reddit. The Atlantic, Annotation: Reports a machine May 2012. learning approach for vandalism de- Annotation: Reports on a teacher at tection in Wikipedia where features that lets undergraduates fabricate his- such as ‘upper case ratio’, ‘size ra- tory as part of a course and add the in- tio’ and ‘edits per user’ are used. The formation to Wikipedia, e.g., the Ed- system performed well compared to ward Owens hoax. rule-based vandalism detection bots on Wikipedia. [170] Kevin Morris. How vandals are destroying Wikipedia from the inside. The Daily Dot, Jan- [178] Koen Smets, Bart Goethals, and Brigitte Ver- uary 2013. donk. Automatic vandalism detection in Wikipedia: Towards a machine learning ap- Annotation: News report on the Legolas2186 hoax case in Wikipedia. proach. Benelearn08, The Annual Belgian-Dutch Machine Learning Conference, July 2008. [171] Fernanda B. Vi´egas, Marting Wattenberg, and Kushal Dave. Studying cooperation and con- Annotation: Short abstract that re- flict between authors with history flow visualiza- ports an machine learning approach tions. In Elizabeth Dykstra-Erickson and Man- to vandalism detection in Wikipedia. fred Tscheligi, editors, Proceedings of the SIGCHI With naive Bayes, bag-of-words and conference on Human factors in computing sys- probabilistic sequence modeling re- tems, pages 575–582, New York, NY, USA, 2004. sults better than the rule-based Clue- ACM. Bot was not obtained. [172] Reid Priedhorsky, Jilin Chen, Shyong (Tony) K. [179] Andrew G. West, Sampath Kannan, and Insup Lam, Katherine Panciera, Loren Terveen, and Lee. Spatio-temporal analysis of Wikipedia meta- John Riedl. Creating, destroying, and restor- data and the STiki anti-vandalism tool. In Pro- ing value in Wikipedia. In Proceedings of the ceedings of the 6th International Symposium on 2007 International ACM Conference on Support- Wikis and Open Collaboration, page 18, New ing Group Work, pages 259–268, New York, NY, York, NY, USA, 2010. ACM. USA, 2007. ACM. [173] Paul P. Gardner, Jennifer Daub, John Tate, Ben- Annotation: Describes the STiki sys- jamin L. Moore, Isabelle H. Osuch, Sam Griffiths- tem for semi-automated vandalism de- Jones, Robert D. Finn, Eric P. Nawrocki, Di- tection on Wikipedia using metadata ana L. Kolbe, Sean R. Eddy, and Alex Bate- from the revisions. man. Rfam: Wikipedia, clans and the ”decimal” [180] Andrew G. West, Sampath Kannan, and Insup release. Nucleic Acids Research, 39:D141–D145, Lee. STiki: an anti-vandalism tool for Wikipedia November 2010. using spatio-temporal analysis of revision meta- Annotation: Update on the Rfam data. In Proceedings of the 6th International Sym- database including its connection to posium on Wikis and Open Collaboration, New Wikipedia. York, NY, USA, 2010. ACM.

68 Annotation: More or less the same [189] Maryam Omidi. PR firm accused of whitewash- paper as ‘Spatio-temporal analysis of ing Maldives entry on Wikipedia. Minivan News, Wikipedia metadata and the STiki September 2009. anti-vandalism tool’ Annotation: News story about a bi- [181] R. Stuart Geiger and David Ribes. The work of ased Wikipedia edits originating from sustaining order in Wikipedia: the banning of a a PR firm employed by the Maldives. vandal. In Proceedings of the 2010 ACM con- [190] . Ministry bans Wikipedia edit- ference on Computer supported cooperative work, ing. Guardian Unlimited, November 2007. pages 117–126, New York, NY, USA, 2010. Asso- ciation for Computing Machinery. [191] Asger Westh. Wikipedia: Den elektroniske slag- mark. epn.dk, January 2008. Annotation: Describes the technical systems around Wikipedia that helps Annotation: A news article about in vandal fighting, describes the Hug- a version of the WikiScanner for the gle assisted editing tools and gives a Danish Wikipedia. narrative about reverting and blocking [192] Daniel Erenrich. Wikiscanner: Automated con- a vandal. flict of interest detection of anonymous Wikipedia edits. In Student-Faculty Programs: 2008 Ab- [182] Eric Goldman. Wikipedia will fail within 5 years. stract Book, page 22. California Institute of Tech- Technology & Marketing Law Blog, December nology, 2008. 2005. [183] Thomas Claburn. Law professor predicts Annotation: Student abstract about Wikipedia’s demise. InformationWeek, December the Wikiscanner, that detects conflict 2006. of interest edits in Wikipedia also us- ing Ip2location and USPTO trade- Annotation: News story about a blog mark databases and computing link post from Professor Eric Goldman, distance between pages and categories. that predicted the failure of Wikipedia [193] . Official City of Melbourne IP ad- to counter a massive attack by marke- dress used for biased edits to Wikipedia page for teers. Occupy Melbourne prior to local election. Boing [184] Brian Bergstein. Microsoft offers cash for Boing, February 2013. Wikipedia edit. Washingtonpost.com, January [194] Andrew Lih. Long live Wikipedia? Sustain- 2007. able volunteerism and the future of crowd-sourced Annotation: News article about Mi- knowledge. In John Hartley, Jean Burgess, and crosoft offering to pay blogger Rick Axel Bruns, editors, A Companion to New Me- Jelliffe to change Wikipedia articles dia Dynamics, pages 185+. Blackwell publishing, as the company was sure that it con- first edition, 2013. tained inaccuracies about an open- [195] Figaro. Condamn´e pour une correction source document standard and the Mi- Wikipedia. Figaro, July 2011. crosoft rival format. [196] Ryan McGrady. Gaming against the greater [185] Roger Cadenhead. Adam Curry caught in sticky good. First Monday, 14(2), February 2009. wiki. Internet, December 2005. Annotation: A discussion of author- Annotation: Blog on how an a per- ity, rhetoric on Wikipedia and wiki- son from an IP address associated lawyering where Wikipedia policy is with podcasting entrepreneur Adam misused by editors. The report of an Curry has edited in the “Podcasting” interview with a single editor shows Wikipedia article. the problem with abuse of policy in a case of an . [186] Emily Biuso. Wikiscanning. New York Times, [197] Simon Owen. The battle to destroy Wikipedia’s December 2007. biggest sockpuppet army. The Daily Dot, Octo- Annotation: News article about Vir- ber 2013. gil Griffith’s Wikipedia mining tool Annotation: Technical news article Wikiscanner. about a sockpuppet investigation and [187] Jonathan Fildes. Wikipedia ‘shows CIA page ed- paid editing on Wikipedia. its’. BBC NEWS, August 2007. [198] Rishi Chandy. Wikiganda: Identifying propa- [188] Brian Brady. BBC staff rewrote Wikipedia pages ganda through text analysis. Caltech Undergrad- to water down criticism. , De- uate Research Journal, 9(1):6–11, Winter 2008– cember 2007. 2009.

69 Annotation: Description of a opinion [207] B. Thomas Adler, Luca de Alfaro, Ian Pye, and mining system for Wikipedia edits. Vishwanath Raman. Measuring author contribu- tion to the Wikipedia. In Proceedings of WikiSym [199] Violet Blue. Corruption in wikiland? Paid 2008. ACM, 2008. PR scandal erupts at Wikipedia. CNET News, September 2012. [208] Felipe Ortega, Jes´usM. Gonz´alez-Barahona,and Gregorio Robles. On the inequality of contri- [200] Marcia W. DiStaso. Measuring public relations butions to Wikipedia. In Proceedings of the Wikipedia engagement: how bright is the rule?. 41st Hawaiian International Conference on Sys- Public Relations Journal, 6(2), 2012. tem Sciences (HICSS-2008), 2008. Annotation: Reports on a sur- vey among 1’284 public relations/- Annotation: Computation of the communications professionals about Gini coefficient for author contri- their perceptions and practice with butions for the top ten largest Wikipedia. Wikipedias. They find the Gini coef- ficients to be from 0.92 to 0.97. [201] David King. Ethical Wikipedia strategies for . CW Bulletin, 2012. [209] Aaron Swartz. Who writes Wikipedia. Raw Thought, September 2006. Blog. [202] Amanda Filipacchi. Wikipedia’s sexism toward female novelists. The New York Times, April Annotation: A report on a small 2013. investigation on the different amount of contribution authors make to Annotation: Op-ed on Wikipedia Wikipedia. Contrary to previous subcategorizing female but not male thought the examination of a few American Novelists. Wikipedia articles shows that essen- [203] Emma Gray. Women novelists Wikipedia: Female tial content was mostly written by authors absent from site’s ‘American Novelists’ occasional contributors. page?. The Huffington Post, April 2013. [210] R. Stuart Geiger. The social roles of bots and Annotation: News article on assisted editing programs. In Proceedings of the Wikipedia subcategorizing female but 5th International Symposium on Wikis and Open not male American Novelists. Collaboration, page 30, New York, NY, USA, [204] Bongwon Suh, Ed H. Chi, Aniket Kittur, and 2009. Association for Computing Machinery. Bryan A. Pendleton. Lifting the veil: Improv- Annotation: A study on the amount ing accountability and social transparency in of edits on the English Wikipedia in Wikipedia with WikiDashboard. In CHI ’08: 2009 made by bots or through assisted Proceeding of the twenty-sixth annual SIGCHI editing tools. Combined they make conference on Human factors in computing sys- up 28.49% of the edits, — a number tems, pages 1037–1040, New York, NY, USA, slightly larger than the number of ed- 2008. ACM. its performed by anonymous users. Annotation: Description of the [211] Jochen Rick, Mark Guzdial, Karen Carroll, Lissa WikiDashboard web-site that makes a Holloway-Attaway, and Brandy Walker. Collabo- visualization of author contributions rative learning at low cost: CoWeb use in English of a Wikipedia article and embeds composition. In Proceedings of the Conference it in a proxy copy of the Wikipedia on Computer Support for Collaborative Learning: article. Visualization may also be Foundations for a CSCL Community, pages 435– generated with respect to a user. 442. International Society of the Learning Sci- Social transparency in relation to ences, 2002. Wikipedia is also discussed. Annotation: Describes experience [205] Sara Javanmardi, Cristina Lopes, and Pierre with the CoWeb wiki used for teach- Baldi. Modeling user reputation in wikis. Sta- ing English. Students using the wiki tistical Analysis and Data Mining, 3(2):126–139, had better attitude towards collabora- April 2010. tion and got higher grades compared Annotation: Describes a model for to another online environment. time-depedent user reputation on [212] Andrea Forte and Amy Bruckman. From wikis and applies it on the entire revi- Wikipedia to the classroom: exploring online sion history of the English Wikipedia. publication and learning. In Proceedings of the [206] Jimmy Wales. Wikipedia, emergence, and the 7th international conference on Learning sci- wisdom of crowds. wikipedia-l mailing list, May ences, pages 182–188. International Society of the 2005. Accessed 2009 February 8. Learning Sciences, 2006.

70 Annotation: Describes experience [222] Yair Amichai-Hamburger, Naama Lamdan, Rinat with having fresman-level college stu- Madiel, and Tsahi Hayat. Personality character- dents write and interact in a wiki istics of wikipedia members. CyberPsychology & where the students can comment of Behavior, 11(6):679–681, December 2008. each others articles. Annotation: Application of the Big [213] Wikimedia Foundation. Wikipedia editors study: Five Inventory and Real-Me person- Results from the editor survey, april 2011. Wiki- ality questionnaires to 139 Wikipedia media Commons, August 2011. and non-Wikipedia users. The recruit- Annotation: Report on a survey ment was based on targeting posting among several thousand Wikipedia ed- of links. Wikipedians scored lower on itors. agreeableness and higher on openness. Differences in extroversion and consci- [214] Nuˇsa Fariˇcand Henry W. W. Potts. Motiva- entiousness depended on the sex of the tions for contributing to health-related articles subject. on Wikipedia: An interview study. In Medicine 2.0’13, September, 2013. [223] Ralph Straumann and Mark Graham. The geo- Annotation: Report on an interview graphically uneven coverage of Wikipedia. Infor- study on 32 Wikipedians contributing mation Geographies at the Oxford Internet Insti- to Wikipedia health-related articles. tute, 2014. [215] Tilman Bayer. How many women edit wikipedia?. Annotation: Analysis of the geo- Wikimedia Blog, April 2015. tagged articles in 44 different language [216] Ruediger Glott, Philipp Schmidt, and Rishab versions of Wikipedia. Ghosh. Wikipedia survey — overview of results. [224] Seth Finkelstein. Wikipedia’s school for scandal Report, UNU-MERIT, Univer- has plenty more secrets to reveal. The Guardian, sity, Maastricht, Netherlands, March 2010. March 2008. [217] Shyong (Tony) K. Lam, Anuradha Uduwage, [225] Andrew George. Avoiding tragedy in the wiki- Zhenhua Dong, Shilad Sen, David R. Musicant, commons. Virginia Journal of Law and Technol- Loren Terveen, and John Riedl. Wp:clubhouse? ogy, 12(8):1–42, 2007. an exploration of Wikipedia’s gender imbalance. In Proceedings of the 7th International Sympo- [226] Reijo Kupiainen, Juha Suoranta, and Tere Vaden. sium on Wikis and Open Collaboration, New Fire next time: or revisioning higher education in York, NY, USA, 2011. Association for Computing the context of digital social creativity. E-Learning Machinery. and Digital Media, 4(2):128–137, 2007. [218] Benjamin Mako Hill and Aaron Shaw. The Annotation: Discuss some aspects of Wikipedia gender gap revisited: Characterizing digial media and higher eductions. survey response bias with propensity score esti- mation. PloS one, 8(6):e65782, 2013. [227] Pattarawan Prasarnphanich and Christian Wag- [219] Noam Cohen. Define gender gap? look up ner. Explaining the sustainability of digital Wikipedia’s contributor list. The New York ecosystems based on the wiki model through criti- Times, January 2011. cal mass theory. IEEE Transactions on Industrial Electronics, 58(6):2065–2072, June 2011. Annotation: Reports on the gen- der gap among Wikipedia contributors [228] Christian Wagner. Breaking the knowledge acqui- and discusses its influence on the topic sition bottleneck through conversational knowl- emphasis on Wikipedia. edge management. Information Resources Man- agement Journal, 19(1):70–83, January 2005. [220] Rishab A. Ghosh, Ruediger Glott, Bernhard Krieger, and Gregorio Robles. FLOSS. deliver- [229] Paolo Magrassi. Free and open-source software is able D18: Final report. part 4: Survey of de- not an emerging property but rather the result of velopers. International Institute of Infonomics, studied design. arXiv, November 2010. University of Maastricht, The Netherlands, June 2002. Annotation: Discusses free and open-source software and argues for a Annotation: Reports on a survey of role of top-down coordination. FLOSS developers. [230] Eric S. Raymond. The cathedral and the bazaar. [221] Vijay A. Mittal, Kevin D. Tessner, and Elaine First Monday, 3(3), March 1998. f. Walker. Elevated social internet use and schizotypal personality disorder in adolescents. [231] Kevin Crowston and James Howison. The social Research, 94(1-3):50–57, August structure of free and open source software devel- 2007. opment. First Monday, 10(2), February 2005.

71 Annotation: An analysis of 140 Free [242] Micha¨elR. Laurent and Tim J. Vickers. Seek- and Open Source Software project ing health information online: does Wikipedia hosted on SourceForge with a study of matter? Journal of the American Medical Infor- contributor interaction network. matics Association, 16(4):471–479, July-August 2009. [232] Peter Miller. Smart swarm: Using animal be- haviour to change our world. Collins, , Annotation: Describes an investi- 2010. gation of search engine ranking for health topics in Wikipedia with search Annotation: Book about mass col- engine optimization methods. With laboration among animals and hu- queries selected from MedlinePlus, mans. NHS and NORD they show that the English Wikipedia is more often on [233] Joseph Reagle. Four short stories about the ref- the first place for the selected queries erence work, 2005. on Google as compared to .gov do- Annotation: Discussion of history of main, MedilinePlus, Medscape, NHS reference work production. Direct Online and a number of other domains. The also investigated health- [234] Mark Elliott. Stigmergic collaboration: the evo- related topics with seasonal effects. lution of group work. M/C Journal, 9(2), May [243] Marcia W. DiStaso and Marcus Messner. Forced 2006. transparency: corporate image on Wikipedia and Annotation: Discuss the idea of stig- what it means for public relations. Public Rela- mergy in relation to Wikipedia. tions Journal, 4(2), Spring 2010.

[235] Jeff Loveland and Joseph Reagle. Wikipedia and Annotation: of encyclopedic production. New Media & Society, Wikipedia articles on companies as January 2013. well as examination of Internet search engine ranking. 10 Fortune 500 com- [236] Dan Wielsch. Governance of massive multiau- panies were selected in the years 2006, thor collaboration - Linux, Wikipedia, and other 2008 and 2010 and examined for topic, networks: governed by bilateral contracts, part- tonality and length, number of edits, nerships, or something in between. JIPITEC, etc. 1(2):96–108, July 2010. [244] Nathan Safran. Wikipedia is the SERPs: Ap- [237] Martin Pek´arekand Stefanie P¨otzsch. A compar- pears on page 1 for 60% of information, 34% ison of privacy issues in collaborative workspaces trransactional queries. Conductor Blog, March and social networks. Identity in the Information 2012. Society, 2(1):81–93, December 2009. Annotation: Reports a non-peer- [238] comScore, Reston. Media Alert: comScore Re- reviewed study on the the position of leases Worldwide Ranking of Top Web Properties, Wikipedia in Internet search engine October 2006. results. Using 2’000 keywords from representative search engine queries Annotation: Statistics on the worlds it was found that the appearance of most visited Web sites during Septem- Wikipedia on page one depended on ber 2006. whether the query was information or [239] comScore, Reston. Ghosts and Goblins Drive transactional and on how many words Traffic to Greetings, Gift and Toys Sites in Octo- were in the query. ber, November 2006. [245] Sam Silverwood-Cope. Wikipedia: Page one of [240] Jon W. Huss, III, Pierre Lindenbaum, Michael Google UK for 99% of searches. Intelligent Posi- Martone, Donabel Roberts, Angel Pizarro, Fara- tioning, February 2012. marz Valafar, John B. Hogenesch, and Andrew I. Annotation: Non-peer-reviewed Su. The Gene Wiki: community intelligence ap- study on the search engine ranking plied to human gene annotation. Nucleic Acids of Wikipedia. Wikipedia appear on Research, 38:D633–D639, 2010. Google’s page one based on 1’000 [241] Bruce Etling, John Kelly, Robert Faris, and John random nouns queried. Palfrey. Mapping the Arabic blogosphere: Pol- [246] American Customer Satisfaction Index. July 2011 itics, culture and dissent. Research Publication and historical ACSI scores, July 2011. 2009-06, Berkman Center, June 2009. [247] American Customer Satisfaction Index. Bench- Annotation: A study of 35’000 active marks by industry: Internet social media, July Arabic language blogs. 2013.

72 Annotation: ASCI benchmarks from [257] Lynn Wasnak. How much should I charge?. In 2010 to 2013 for websites such as 2006 Writer’s Market, Writer’s Market, pages 68– Wikipedia, Pinterest, YouTube and 78. Writers Digest Books, 2006. Myspace. Annotation: Overview of freelance [248] Klaus Larsen. Læger er glade for Wikipedia. writer’s word, page and project rates Ugeskrift for Læger, May 2012. for different kinds of work. Annotation: News report on a survey [258] Susan L. Bryant, Andrea Forte, and Amy Bruck- conducted by the Mannov company on man. Becoming Wikipedian: transformation the use of social media in professional of participation in a collaborative online ency- contexts of Danish medical doctors. clopedia. In Proceedings of the 2005 interna- tional ACM SIGGROUP conference on Support- [249] Mannov. Danske læger er vilde med sociale me- ing group work, pages 1–10, New York, NY, USA, dier. Internet, May 2012. 2005. ACM. Report of Danish medi- Annotation: Annotation: Describes interviews cal doctors use of social media for pro- with 9 Wikipedia contributors and fessional activities. some of their characteristics: Most of [250] Andrew G. West and Milowent. Examining the contributors tell that their initial the popularity of Wikipedia articles: catalysts, edit was for correcting a problem or trends, and applications. The Signpost, February extending a weak article. As novices 2013. they were not aware of the Wikipedia community. As more experienced con- Annotation: Analysis of Wikipedia tributors they get a sense of commu- page view statistics. nity and decrease article writing and [251] Andrew G. West, Jian Chang, Krishna Venkata- increase administration. subramanian, Oleg Sokolsky, and Insup Lee. Link [259] Timme Bisgaard Munk. Why Wikipedia: self- spamming Wikipedia for profit. In Proc. of the efficacy and self-esteem in a knowledge-political 8th Annual Collaboration, Electronic Messaging, battle for an egalitarian epistemology. Observa- Anti-Abuse, and Spam Conference, pages 152– torio, 3(4), 2009. 161, New York, NY, USA, 2011. Association for Computing Machinery. Annotation: Semi-structured qual- itative interview with six Danish [252] Jonathan Band and Jonathan Gerafi. Wikipedia’s Wikipedia volunteers for investigating economic value. SSRN, October 2013. the movitation for contribution. The Annotation: Review of studies on interviews were transcribed ‘catego- the economic value of Wikipedia and rized and analyzed thematically’. some further estimates on replacement [260] Xiaoquan Zhang and Feng Zhu. Intrinsic moti- cost and consumer value. vation of open content contributors: the case of [253] Anselm Spoerri. What is popular on Wikipedia Wikipedia. In Workshop on Information Systems and why?. First Monday, 12(4), April 2007. and Economics, 2006. [254] Sook Lim and Nahyun Kwon. Gender differences [261] Heng-Li Yang and Cheng-Yu Lai. Motivations in information behavior concerning Wikipedia, an of Wikipedia content contributors. Computers unorthodox information source. Library & In- in Human Behavior, 26(6):1377–1383, November formation Science Research, 32(3):212–220, July 2010. 2010. Annotation: Reports on a survey Annotation: Studies gender differ- among Wikipedia contributors about ence in the use of Wikipedia. their motivation for sharing knowl- edge. [255] R. Stuart Geiger and . Using edit sessions to measure participation in Wikipedia. In [262] Michael Restivo and Arnout van de Rijt. Experi- Proceedings of the 2013 conference on Computer mental study of informal rewards in peer produc- supported cooperative work, New York, NY, USA, tion. PLoS ONE, 7(3):e34358, 2012. 2013. Associations for Computing Machinery. Annotation: An experiment where [256] The ed17 and Tony1. WMF employee forced out Wikipedia editors were given informal over “paid advocacy editing”. Wikipedia Sign- awards to see how it affected their pro- post, January 2014. ductivity. Annotation: Blog post on paid edit- [263] Haiyi Zhu, Amy Zhang, Jiping He, Robert E. ing on Wikipedia by a Wikimedia Kraut, and Aniket Kittur. Effects of peer feed- Foundation employee. back on contribution: a field experiment in

73 Wikipedia. In Wendy E. Mackay, Stephen Brew- Annotation: A suggestion for a com- ster, and Susanne Bødker, editors, CHI 2013, putation of reputation of the individ- pages 2253–2262, New York, NY, USA, 2013. As- ual authors in a wiki-system. Formulas sociation for Computing Machinery. are shown but it is not clear if the sug- [264] Andrea Ciffolilli. Phantom authority, self- gestion is tested on a real wiki. selective recruitment and retention of members [272] Arto Lanam¨aki, Mikko Rajanen, Anssi O¨orni,¨ in virtual communities: The case of Wikipedia. and Netta Iivari. Once you step over the first First Monday, 8(12), December 2003. line, you become sensitized to the next: towards [265] Guillaume Paumier. VisualEditor. MediaWiki a gateway theory of online participation. In Pro- wiki, July 2011. ceedings of the International Conference on In- formation Systems - Exploring the Information Annotation: Early notes about the Frontier, ICIS 2015, Fort Worth, Texas, USA, VisualEditor project for a WYSIWYG December 13-16, 2015, 2015. editor in the MediaWiki software. Annotation: Propose to view on- [266] Martin Robbins. Is the PR industry buying in- line participation under the framework fluence over Wikipedia?. VICE, October 2013. of gateway theory as in contrast to Annotation: News report on the con- antecedents-based explanations. troversial Wikipedia editing of the company Wiki-PR. [273] Andrea Forte and Amy Bruckman. Why do peo- ple write for Wikipedia? incentives to contribute [267] Jonathan Corbet, Greg Kroah-Hartman, and to open-content publishing. In Group 2005 work- Amanda McPherson. Linux kernel development: shop: Sustaining community: The role and design How fast it is going, who is doing it, what they of incentive mechanisms in online systems. Sani- are doing, and who is sponsoring it. Linux Foun- bel Island, FL., November 2005. dation, September 2013. Annotation: From interviews of 22 Annotation: Statistics on Linux Ker- Wikipedia contributors to understand nel development. their motivation the study argues that [268] Eric Goldman. Wikipedia’s labor squeeze and the incentive system in Wikipedia re- its consequences. Journal on Telecommunications sembles that of science. and High Technology Law, 8(1):157–183, Winter [274] Seth Finkelstein. What arti- 2010. cle fraud tells us about Wikipedia. Infothought, Annotation: Discussion of (possi- March 2007. ble future) sustainability problems for Annotation: Blogpost which re- Wikipedia and what can be done to flects on the Essjay controversy in counter them. Wikipedia. [269] Magnus Cedergren. Open content and value cre- ation. First Monday, 8(8), August 2003. [275] Travis Kriplean, Ivan Beschastnikh, and David W. McDonald. Articulations of wiki- Annotation: Study on three open work: uncovering valued work in Wikipedia content projects: Wikipedia, Open through barnstars. In Proceedings of the 2008 Directory Project and the Prelinger ACM conference on Computer supported cooper- with a discussion of the mo- ative work, pages 47–56, New York, NY, USA, tivation of contributors. 2008. ACM. [270] Oded Nov. What motivates Wikipedians?. Com- [276] Aaron Shaw and Benjamin Mako Hill. Can so- munications of the ACM, 50(11):60–64, Novem- cial awards create better wikis?. Wikimania 2012, ber 2007. March 2012.

Annotation: Reports on an analysis Annotation: Analysis of the effect of a survey among Wikipedia contrib- of social awards on productivity in utors for investigation of their motiva- Amazon Mechanical Turk, the pro- tion as contributors. gramming language Scratch and in Wikipedia. [271] Bernhard Hoisl, Wolfgang Aigner, and Silvia Miksch. Social rewarding in wiki systems — mo- [277] Richard Fisher. Just can’t get e-nough. New Sci- tivating the community. In Online Communi- entist, 192(2583/2584):34–37, 2006. ties and Social Computing, volume 4564 of Lec- ture Notes in Computer Science, pages 362–371, Annotation: Description of various Berlin/Heidelberg, August 2007. Springer. instances of Internet addictions.

74 [278] Mike Cardosa, Katie Panciera, and Anna [285] Rodrigo Almeida, Barzan Mozafari, and Junghoo Rouben. Comparative study of Wikipedia users Cho. On the evolution of Wikipedia. In Interna- (2002 vs. 2005), 2007? Apparently only pub- tional Conference on Weblogs and Social Media, lished on Rouben’s website. Year of publication 2007. is unknown. [286] Sascha Segan. R.i.p usenet: 1980-2008. PC- Annotation: A quantitative longitu- Mag.com, July 2008. dinal study of Wikipedia users. Annotation: Discusses the decline of [279] Jared Diamond, editor. The world until yester- USENET day: What can we learn from traditional soci- [287] Megan Garber. The contribution conundrum: eties? Penguin Books, London, England, 2013. Why did Wikipedia succeed while other encyclo- [280] Wikimedia Foundation staff and volunteers. For- pedias failed?. Nieman Journalism Lab, 2011. mer contributors survey results. Wikimedia Strategic Planning, February 2010. Annotation: A description of re- search on why Wikipedia was success- Annotation: Results about a survey ful and other crowd-source online en- on Wikipedia contributors that have cyclopedia were not. The encyclopedia stopped editing. as a familiar product, focus on sub- stantive content rather than technol- [281] Bongwon Suh, Gregorio Convertino, Ed H. Chi, ogy as well as low transaction cost in and Peter Pirolli. The singularity is not near: participation were reported. slowing growth of Wikipedia. In Proceedings of the 5th International Symposium on Wikis and [288] Philippe Aigrain. The individual and the col- Open Collaboration. Association for Computing lective in open information communities. Ex- Machinery, 2009. tended abstract to 16th BLED Electronic Com- merce Conference, June 2003. Annotation: Data analysis of the evolution of Wikipedia. Annotation: Discussion of reasons [282] Camille Roth. Viable wikis: struggle for life in for success in Wikipedia, Slashdot the wikisphere. In Proceedings of the 2007 In- and SPIP focussing on low transaction ternational Symposium on Wikis, pages 119–124. costs, clear vision and mechanisms and Association for Computing Machinery, 2007. software tools to counter hostile con- tribution. [283] Dennis M. Wilkinson and Bernardo A. Huber- man. Cooperation and quality in Wikipedia. In [289] Hsin-Yi Wu and Kuang-Hua Chen. A quantita- Proceedings of the 2007 International Symposium tive comparison on online encyclopedias — a case on Wikis, pages 157–164, New York, NY, USA, study of Wikipedia and Knol. In Proceedings of 2007. ACM. 2010 International Symposium on the Transfor- mation and Innovation of Library and Informa- Annotation: An investigation of tion Science, November 2010. what signals quality articles on the English Wikipedia. A growth model Annotation: Related to the paper A is suggested for the number of ed- quantitative study for online encyclo- its on an article, and this lognormal pedias: comparison of Wikipedia and model shows good agreement with the Knol. The present paper has informa- observed number of edits. It is also tion, e.g., about which articles the re- shown that older articles ypically have searchers studied. more edits than newer articles and [290] Kuang-Hua Chen and Hsin-Yi Wu. A quantita- that quality articles have compara- tive study for online encyclopedias: comparison of bly more edits, many distinct editors, Wikipedia and Knol. Journal of Library and In- many edits in its talk page, a quick formation Science Research, 7(1):32–38, Decem- ‘turnaround’ and a high number of ed- ber 2012. its per editor when correcting for arti- cle age and Google PageRank. They Annotation: Comparison of 20 arti- write that featured articles with high cles from Wikipedia and Google Knol Google PageRank have a lower num- with respect to page views, number ber of talk page edits than articles of words, readability, number of cita- with low PageRank. it is not clear if tions, number of references and types that effect is due to archiving. of references. [284] Dennis M. Wilkinson and Bernardo A. Huber- [291] Jason Kincaid. Google announces plans to shut- man. Assessing the value of cooperation in ter Knol, Friend Connect, Wave, and more. Wikipedia. First Monday, 12(4), April 2007. TechCrunch, November 2011.

75 Annotation: News articles about Annotation: Conference article of Google announcing the closing of ‘Extracting content holes by com- Knol. paring community-type content with Wikipedia’. [292] David M. Pennock, Steve Lawrence, C. Lee Giles, and Finn Arup˚ Nielsen. The real power of artifi- [300] Akiyo Nadamoto, Eiji Aramaki, Takeshi cial markets. Science, 291(5506):987–988, Febru- Abekawa, and Yohei Murakami. Extracting ary 2001. content holes by comparing community-type content with Wikipedia. International Journal of Annotation: A brief note reporting Web Information Systems, 6(3):248–260, August that artificial markets, such as Holly- 2010. wood Stock Exchange and Foresight Exchange, are good predictors of fu- Annotation: Describes a system for ture outcomes. discovery of missing subtopics in a text by comparison against text in [293] David M. Pennock, Steve Lawrence, Finn Arup˚ Wikipedia. Nielsen, and C. Lee Giles. Extracting collec- tive probabilistic forecasts from web games. In [301] Akiyo Nadamoto, Eiji Aramaki, Takeshi Proceedings of the seventh ACM SIGKDD in- Abekawa, and Yohei Murakami. Content hole ternational conference on Knowledge discovery search in community-type content. In Proceedings and data mining, pages 174–183, New York, NY, of the 18th international conference on World USA, August 2001. SIGKDD, Association for wide web, pages 1223–1224, New York, NY, USA, Computing Machinery, ACM. 2009. Association for Computing Machinery. [294] Shane Greenstein. The range of Linus’ Law. IEEE Annotation: Short paper on the Micro, 32(1):72–73, January/February 2012. same topic as ‘Extracting content Annotation: Discusses Linus’ Law holes by comparing community-type with respect to Wikipedia. content with Wikipedia’. [295] Alexandre Bouchard, Percy Liang, Thomas [302] Marijn Koolen, Gabriella Kazai, and Nick Grifths, and Dan Klein. A probabilistic ap- Craswell. Wikipedia pages as entry points for proach to diachronic phonology. In Proceedings book search. In WSDM ’09: Proceedings of the of EMNLP-CoNLL, pages 887–896, 2007. Second ACM International Conference on Web Search and Data Mining, pages 44–53, New York, [296] Paula Chesley, Bruce Vincent, Li Xu, and Ro- NY, USA, 2009. ACM. hini Srihari. Using verbs and adjectives to au- tomatically classify blog sentiment. In Nicolas [303] David N. Milne, Ian H. Witten, and David M. Nicolov, Franco Salvetti, Mark Liberman, and Nichols. A knowledge-based search engine pow- James H. Martin, editors, Proceedings of AAAI- ered by Wikipedia. In CIKM ’07 Proceedings of CAAW-06, the Spring Symposia on Computa- the sixteenth ACM conference on Conference on tional Approaches to Analyzing Weblogs, AAAI’s Information and Knowledge Management, pages Spring Symposium. The AAAI Press, 2006. 445–454. Association for Computing Machinery, November 2007. Annotation: Text sentiment analy- sis of blogs by using Wiktionary, verb Annotation: Describes the Koru classes and support vector machines document retrieval system, that is classifier. based on a thesaurus constructed from Wikipedia data and features auto- [297] T. Zesch, C. Mueller, and I. Gurevych. Extract- matic query expansion. A user study ing lexical semantic knowledge from Wikipedia with a comparison system shows that and Wiktionary. In Proceedings of the Conference Koru performs better and more pref- on Language Resources and Evaluation (LREC), ered. 2008. [304] Olga Vechtomova. Facet-based opinion retrieval [298] T. Zesch, C. Mueller, and I. Gurevych. Using from blogs. Information Processing and Manage- Wiktionary for computing semantic relatedness. ment, 46(1):71–88, 2010. In Proceedings of AAAI. Association for the Ad- vancement of Artificial Intelligence, 2008. Annotation: Describes an informa- tion retrieval systems with sentiment [299] Akiyo Nadamoto, Eiji Aramaki, Takeshi analysis and query expansion using Abekawa, and Yohei Murakami. Content Wikipedia. hole search in community-type content using Wikipedia. In Proceedings of the 11th Interna- [305] Julien Ah-Pine, Marco Bressan, Stephane Clin- tional Conference on Information Integration chant, Gabriela Csurka, Yves Hoppenot, and and Web-based Applications & Services, pages Jean-Michel Renders. Crossing textual and visual 25–32, New York, NY, USA, 2009. Association content in different application scenarios. Multi- for Computing Machinery. media Tools and Applications, 42(1):31–56, 2009.

76 Annotation: Description of a system [314] Dominic Balasuriya, Nicky Ringland, Joel Noth- for combined text and image similarity man, Tara Murphy, and James R. Curran. Named computation evaluated on the Image- entity recognition in Wikipedia. In Proceed- CLEFphoto 2008 data set and demon- ings of the 2009 Workshop on the People’s Web strated on Wikipedia data. Meets NLP, pages 10–18. Association for Compu- tational Linguistics, 2009. [306] Daniel Kinzler. WikiWord: Multilingual image search and more. In Wikimania, 2009. [315] Mohamed Amir Yosef, Johannes Hoffart, Ilaria Bordino, Marc Spaniol, and Gerhard Weikum. Annotation: Describes the multilin- AIDA: an online tool for accurate disambiguation gual image retrieval system WikiWord of named entities in text and tables. In Proceed- based on data from Wikipedia. ings of the VLDB Endowment, volume 4, pages 1450–1453, 2011. [307] Robert L. Chapman, editor. Roget’s international thesaurus. HarperCollins Publishers, New York, Annotation: Describes an online NY, fifth edition edition, 1992. named entity extraction system with disambiguation using YAGO2, Stan- [308] Andrew Krizhanovsky. Synonym search in ford NER Tagger and PostgreSQL. Wikipedia: Synarcher. In 11-th Inter- national Conference “Speech and Computer” [316] Michael Cook, Simon Colton, and Alison Pease. SPECOM’2006. Russia, St. Petersburg, June 25– Aesthetic considerations for automated plat- 29, pages 474–477, 2006. former design. Association for the Advancement of Artificial Intelligence, 2012. Annotation: Description of the Synarcher program that analyze Annotation: Describes AN- Wikipedia with the Kleinberg HITS GELINA3 that does notable person algorithm and present related terms to detection by looking up a person on a query term in a graph visualization. Wikipedia, and the sentiment about the person is gauged by looking the [309] Michael Strube and Simone Paolo Ponzetto. person up on Twitter and using the WikiRelate! computing semantic relatedness us- AFINN word list for text sentiment ing Wikipedia. In Proceedings of the Twenty-First analysis. AAAI Conference on Artificial Intelligence, pages 1419–1424, Menlo Park, Califonia, 2006. AAAI [317] Daniel Kinzler. Automatischer aufbau eines Press. multilingualen thesaurus durch extraktion se- [310] Decong Li, Sujian Li, Wenjie Li, Congyun Gu, mantischer und lexikalischer relationen aus der and Yun Li. Keyphrase extraction based on topic Wikipedia. Diplomarbeit an der abteilung f¨urau- relevance and term association. Journal of In- tomatische sprachverarbeitung, Institut f¨urInfor- formation and Computational Science, 7(1):293– matik, Universit¨atLeipzig, 2008. 299, 2010. [318] Eckhard Bick. WikiTrans: the English Wikipedia [311] Razvan Bunescu and Marius Pasca. Using ency- in Esperanto. In Constraint Applica- clopedic knowledge for named entity disambigua- tions, Workshop Proceedings at Nodalida 2011, tion. In Proceedings of the 11th Conference of the volume 14, pages 8–16, 2011. European Chapter of the Association for Compu- [319] Eckhard Bick. Translating the Swedish Wikipedia tational Linguistics (EACL-06), pages 9–16. As- into Danish. In Technology sociation for Computational Linguistics, 2006. Conference 2014, 2014. [312] Silviu Cucerzan. Large-scale named entity dis- Annotation: Short description of a ambiguation based on Wikipedia data. In Pro- machine translation system for trans- ceedings of the 2007 Joint Conference on Empir- lating Swedish Wikipedia to Danish. ical Methods in Natural Language Processing and Computational Natural Language Learning, pages [320] Daniel H. Pink. Folksonomy. The New York 708–716. Association for Computational Linguis- Times, December 2005. tics, 2007. Annotation: News article explaining [313] Jun’ichi Kazama and Kentaro Torisawa. Exploit- the at that time recently introduced ing Wikipedia as external knowledge for named term ‘folksonomy’. entity recognition. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natu- [321] Jakob Voß. Collaborative thesaurus tagging the ral Language Processing and Computational Nat- Wikipedia way. arXiv, April 2006. ural Language Learning, pages 698–707, Strouds- burg, PA, USA, June 2007. Association for Com- Annotation: Analysis of the putational Linguistics. Wikipedia category system.

77 [322] Tao Guo, David G. Schwartz, Frada Burstein, [329] Thomas Steiner. Comprehensive Wikipedia mon- and Henry Linger. Codifying collaborative knowl- itoring for global and realtime natural disaster edge: using Wikipedia as a basis for automated detection. In Ruben Verborgh and Erik Man- ontology learning. Knowledge Management Re- nens, editors, Proceedings of the ISWC Develop- search & Practice, 7(3):206–217, September 2009. ers Workshop, volume 1268 of CEUR Workshop Proceedings, pages 86–95, 2014. Annotation: Describes a semi- automated ontology learning system [330] Murray Aitken, Thomas Altmann, and Daniel based on Wikipedia data. Rosen. Engaging patients through social media: Is healthcare ready for empowered and digitally [323] Magn´us Sigurdsson and Søren Christian Halling. demanding patients?. IMS Institute for Health- Zeeker: A topic-based search engine. Master’s care Informatics, January 2014. thesis, Informatics and Mathematical Modelling, Technical University of Denmark, Kongens Lyn- Annotation: Analysis of social me- gby, Denmark, 2007. dia, (YouTube, Facebook, Twitter and Wikipedia), in relation to social media [324] Fabian Suchanek, Gjergji Kasneci, and Gerhard engagement in health care. Weikum. YAGO: A core of semantic knowledge unifying WordNet and Wikipedia. In Carey L. [331] M´arton Mesty´an, Taha Yasseri, and J´anos Williamson, Mary Ellen Zurko, Peter F. Patel- Kert´esz. Early prediction of movie box office suc- Schneider, and Prashant J. Shenoy, editors, Pro- cess based on Wikipedia activity big data. ArXiv ceedings of the Sixteenth International World 1211.0970, November 2012. Wide Web Conference (WWW2007), May 8-12, [332] Brain Keegan. Does Wikipedia editing activ- 2007, Banff, Alberta, CANADA, pages 697–706. ity forecast Oscar wins?. www.brianckeegan.com, ACM, 2007. March 2014. [325] Georgi Kobilarov, Tom Scott, Yves Raimond, Sil- Blog post with predic- ver Oliver, Chris Sizemore, Michael Smethurst, Annotation: tion of 2014 winners for 5 Academy Christian Bizer, and Robert Lee. Media meets Awards categories. semantic web — how the BBC uses DBpedia and Linked Data to make connections. In Lora Aroyo, [333] Taha Yasseri and Jonathan Bright. Can elec- Eyal Oren, Paolo Traverso, Fabio Ciravegna, toral popularity be predicted using socially gen- Philipp Cimiano, Tom Heath, Eero Hyv¨onen,Ri- erated big data. it - , ichiro Mizoguchi, Marta Sabou, and Elena Sim- 56(5):246–253, September 2014. perl, editors, The Semantic Web: Research and Annotation: Analysis of how good Applications, volume 5554 of Lecture Notes in Google search volume and Wikipedia Computer Science, pages 723–737, Berlin/Heidel- page views are for the prediction of berg, 2009. Springer. elections in Iran, Germany and United [326] Brian Krebs. Wikipedia edits forecast vice presi- Kingdom. dential picks. washingtonpost.com, August 2008. [334] Kevin Makice. Twitter API: Up and running. Annotation: News articles noting O’Reilly, Sebastopol, California, 2009. large activity in US vicepresidental [335] Stefanie Haustein, Isabella Peters, Judit Bar-Ilan, candidate Wikipedia ar- Jason Priem, Hadas Shema, and Jens Terliesner. ticle just before her candidacy became Coverage and adoption of altmetrics sources in public knowledge. the bibliometric community. ArXiv. [327] Cord Jefferson. Did an anonymous Wikipedia ed- [336] Perry Evans and Michael Krauthammer. Explor- itor try to out the Petraeus affair back in Jan- ing the use of social media to measure journal uary?. Gawker, November 2012. article impact. In AMIA Annual Symposium Pro- Annotation: News story about an ceedings, pages 374–381, January 2011. edit on Wikipedia that seemingly re- Annotation: Altmetrics study of veal information before it became pub- Wikipedia extracting scientific journal lic knowledge. article citations based on PMID and [328] Thomas Steiner, Seth van Hooland, and Ed Sum- DOI with comparison to Faculty of mers. MJ no more: using concurrent Wikipedia 1000 data. With longitudinal analysis edit spikes with social network plausibility checks they found articles to the cited increas- for breaking news detection. In Proceedings of ingly faster. the 22nd international conference on World Wide [337] J¨orgWaitelonis, Nadine Ludwig, Magnus Knuth, Web companion, pages 791–794, 2013. and Harald Sack. WhoKnows? evaluating linked Annotation: Describes a realtime data heuristics with a quiz that cleans up DBpe- breaking news monitoring service us- dia. Interactive Technology and Smart Education, ing Wikipedia edits. 8(4):236–248, 2011.

78 [338] Mihai Letia, Nuno Pregui¸ca,and Marc Shapiro. [345] Aaron Halfaker, R. Stuart Geiger, Jonathan T. Consistency without concurrency control in large, Morgan, and John Riedl. The rise and decline of dynamic systems. ACM SIGOPS Operating Sys- an open collaboration system: how Wikipedia’s tems Review, 44:29–34, 2010. reaction to popularity is causing its decline. American Behavioral Scientist, 57(5):664–688, Annotation: Describes a system December 2013. from sharing mutable data. [346] Pierpaolo Dondio, Stephen Barrett, Stefan We- [339] Richard Bentley, Thilo Horstmann, Klaas Sikkel, ber, and Jean Marc Seigneur. Extracting trust and Jonathan Trevor. Supporting collabora- from domain analysis: A case study on the tive information sharing with the world wide Wikipedia project. In Autonomic and Trusted web: The BSCW shared workspace system. In Computing, volume 4158 of Lecture Notes in Fourth International World Wide Web Confer- Computer Science, pages 362–373, Berlin/Heidel- ence. World Wide Web Consortium, December berg, October 2006. Springer. 1995. [347] B. Thomas Adler and Luca de Alfaro. A content- [340] Y. Goland, E. Whitehead, A. Faizi, and driven reputation system for the Wikipedia. In D. Jensen. HTTP extensions for distributed au- Proceedings of the 16th international conference thoring – WEBDAV, February 1999. on World Wide Web, pages 261–270, New York, [341] Honglei Zeng, Maher Alhossaini, Li Ding, NY, USA, 2007. Association for Computing Ma- Richard Fikes, and Deborah L. McGuinness. chinery. Computing trust from revision history. In Pro- [348] B. Thomas Adler, J. Benterou, K. Chatterjee, ceedings of the 2006 International Conference on Luca de Alfaro, I. Pye, and V. Raman. Assign- Privacy, Security and Trust, volume 380 of ACM ing trust to Wikipedia content. Technical Re- International Conference Proceeding Series, New port UCSC-CRL-07-09, School of Engineering, York, NY, USA, October 2006. Association for University of California, Santa Cruz, CA, USA, Computing Machinery. November 2007. Annotation: Sets up a statistical [349] B. Thomas Adler, J. Benterou, K. Chatterjee, model (a dynamic Bayesian network) Luca de Alfaro, I. Pye, and V. Raman. Assigning for modeling of Wikipedia article trust trust to Wikipedia content. In Proceedings of the by looking at the revision history. The 4th International Symposium on Wikis, page 26, set Bayesian priors based on whether New York, NY, USA, 2008. Association for Com- the user is administrator, registered puting Machinery. user, anonymous user or blocked user. Annotation: Describes a reputation Further the trust is based on the system for Wikipedia which enable in- amount of insertion and deletions. dividual words to be colored according [342] Joshua E. Blumenstock. Size matters: Word to computed ‘trust’. count as a measure of quality on Wikipedia. [350] B. Thomas Adler, Luca de Alfaro, and Ian Pye. In Proceedings of the 17th International World Detecting Wikipedia vandalism using wikitrust. Wide Web Conference (WWW2008). April 21- In PAN 2010 proceedings, 2010. 25, 2008. , China, April 2008. [351] Santiago M. Mola Velasco. Wikipedia vandalism Annotation: Suggests word count as detection through machine learning: Feature re- a simple measure for quality and shows view and new proposals. In Lab Report for PAN that it perform well when classifying at CLEF 2010, 2010. featured and random articles, — seem- ingly better than the methods of Zeng [352] Meiqun Hu, Ee-Peng Lim, Aixin Sun, Hady W. and Stvilia. Lauw, and Ba-Quy Vuong. Measuring article quality in Wikipedia: models and evaluation. In [343] Gerald C. Kane. A multimethod study of in- Proceedings of the sixteenth ACM conference on formation quality in wiki collaboration. ACM Conference on information and knowledge man- Transactions on Management Information Sys- agement, pages 243–252, New York, NY, USA, tems, 2(1):4, March 2011. 2007. Association for Computing Machinery. Annotation: A logistic regression [353] Christopher Thomas and Amit P. Sheth. Se- analysis of which features were impor- mantic convergence of Wikipedia articles. In tant for quality article on Wikipedia. IEEE/WIC/ACM International Conference on Web Intelligence, pages 600–606. IEEE, 2007. [344] Daniel Nasaw. Meet the ’bots’ that edit Wikipedia. BBC New Magazine, July 2012. [354] Martin Potthast, Benno Stein, and Teresa Holfeld. Overview of the 1st international compe- Annotation: News articles about tition on Wikipedia vandalism detection. In PAN bots operating on Wikipedia. 2010, 2010.

79 [355] Martin Potthast. a Wikipedia Annotation: Describes the WikiPro- vandalism corpus. In Hsin-Hsi Chen, Efthimis N. ject RNA where data and annota- Efthimiadis, Jaques Savoy, Fabio Crestani, and tion about RNA are shared between St´ephaneMarchand-Maillet, editors, 33rd Inter- Wikipedia and the Rfam database. national ACM Conference on Research and De- [362] Sylvain Broh´ee, Roland Barriot, and Yves velopment in Information Retrieval (SIGIR 10), Moreau. Biological knowledge bases using wikis: pages 789–790, New York, NY, USA, July 2010. combining the flexibility of wikis with the struc- ACM. ture of databases. Bioinformatics, 2010. [356] Martin Potthast and Teresa Holfeld. Overview of Annotation: Shortly describes the 2nd international competition on Wikipedia shortly the WikiOpener MediaWiki vandalism detection. In Vivien Petras and Paul extension, that can be used to include Clough, editors, Notebook Papers of CLEF 2011 data from web-based database in a Labs and Workshops, 2011. wiki. They state that the extension is used in two biological databases Annotation: Report from a predic- (wikis): CHDwiki and YTPdb. tion competition on Wikipedia vandal- ism detection. The corpus was based [363] Magnus Manske. The game is on. The Whelming, on both English, German and Span- May 2014. ish Wikipedias. Three systems partic- Annotation: Blog post presenting ipated. the a gamification input for Wikidata: [357] Sara Javanmardi, David W. McDonald, and Wikidata — The Game. Cristina V. Lopes. Vandalism detection in [364] Thomas Pellissier Tanon, Denny Vrandecic, Se- Wikipedia: A high-performing, feature–rich bastian Schaffert, Thomas Steiner, and Lydia model and its reduction through Lasso. In Wiki- Pintscher. From Freebase to Wikidata: the great Sym’11, 2011. migration. 2016. [358] Kermit K. Murray. Mass spectrometry on Annotation: Description of content Wikipedia: Open source and peer review. Pre- migration from Google Freebase to sented at the 55th ASMS Conference on Mass Wikidata with the Primary Sources Spectrometry, June 3–7, 2007, , In- Tool. diana, June 2007. [365] Sisay Fissaha Adafre and Maarten de Rijke. Dis- [359] Jon W. Huss, III, Camilo Orozco, James Goodale, covering missing links in Wikipedia. In Jafar Chunlei, Serge Batalov, Tim J. Vickers, Faramarz Adibi, Marko Grobelnik, Dunja Mladenic, and Valafar, and Andrew I. Su. The Gene Wiki for Patrick Pantel, editors, Proceedings of the 3rd community annotation of gene function. PLoS international workshop on Link discovery, pages Biology, 6(7):e175, July 2008. 90–97, New York, NY, USA, 2005. ACM. [366] Wei Che Huang, Andrew Trotman, and Shlomo Annotation: Description of the cre- Geva. Collaborative knowledge management: ation and addition of over 8000 gene evaluation of automated link discovery in the articles in Wikipedia with an auto- Wikipedia. In Andrew Trotman, Shlomo Geva, mated bot. Information is aggregated and Jaap Kamps, editors, Proceedings of the SI- from Entrez Gene and a gene atlas GIR 2007 Workshop on Focused Retrieval. De- for the mouse and human protein- partment of Computer Science, University of encoding transcriptomes. Otago, 2007. [360] Andrew I. Su, Tim Wiltshire, Serge Batalov, [367] Michael Granitzer, Mario Zechner, Christin Hilmar Lapp, Keith A. Ching, David Block, Jie Seifert, Josef Kolbitsch, Peter Kemper, and Zhang, Richard Soden, Mimi Hayakawa, Gabriel Ronald In’t Velt. Evaluation of automatic link- Kreiman, Michael P. Cooke, John R. Walker, and ing strategies for Wikipedia pages. In Proceedings John B. Hogenesch. A gene atlas of the mouse and of the IADIS WWWInternet Conference 2008. human protein-encoding transcriptomes. Pro- IADIS, 2008. ceedings of the National Academy of Sciences of [368] Rianne Kaptein, Pavel Sedyukov, and Jaap the United States of America, 101(16):6062–6067, Kamps. Linking Wikipedia to the Web. In Pro- April 2004. ceeding of the 33rd international ACM SIGIR conference on Research and development in in- [361] Jennifer Daub, Paul P. Gardner, John Tate, formation retrieval, New York, NY, USA, 2010. Daniel Ramsk¨old,Magnus Manske, William G. ACM. Scott, Zasha Weinberg, Sam Griffiths-Jones, and . The RNA WikiProject: Commu- Annotation: Built and evaluate a nity annotation of RNA families. RNA, October system for prediction of external links 2008. within a Wikipedia article.

80 [369] Bahar Sateli, Marie-Jean Meurs, Greg But- [374] David Aum¨uller. Semantic authoring and re- ler, Justin Powlowski, Adrian Tsang, and Ren´e trieval within a wiki. In 2nd European Semantic Witte. IntelliGenWiki: an intelligent semantic Web Conference (ESWC 2005), 2005. Demos and wiki for life sciences. EMBnet.journal, 18 supple- Posters. ment B:50–52, 2012. Annotation: describes the SHAWN Annotation: Describes IntelliGen- Wiki, a semantic wiki implemented in Wiki a MediaWiki-based wiki with Perl with semantic input in the wiki- natural language processing (NLP) markup and a query feature. The data is represented via a Seman- [375] Sebastian Schaffert. IkeWiki: a semantic wiki tic MediaWiki extension. The NLP is for collaborative knowledge management. In handled with a General Architecture 15th IEEE International Workshops on Enabling for Text Engineering (GATE) pipeline Technologies: Infrastructure for Collaborative and may, e.g., identify organisms and Enterprises, 2006. WETICE ’06, pages 388–396. enzymes. IEEE, 2006. [370] Bahar Sateli and Ren´eWitte. Natural language Annotation: Describes the IkeWiki processing for MediaWiki: the semantic assis- semantic wiki engine. It is imple- tants approach. In Proceedings of the Eighth mented as a Java web application and Annual International Symposium on Wikis and has support for OWL-RDFS reason- Open Collaboration, New York, NY, USA, 2012. ing. Association for Computing Machinery. [376] Michel Buffa and Fabien Gandon. Sweetwiki: se- mantic web enabled technologies in wiki. In Pro- Annotation: Describes a system that ceedings of the 2006 International Symposium on integrates natural language processing Wikis, pages 69–78. Association for Computing with a MediaWiki-based wiki. NLP Machinery, 2006. service is provided by a W3C stan- dard web services running GATE NLP Annotation: Description of the software. The system read data from a SweetWiki semantic wiki engine. It wiki and embed the result in the wiki. does not use a wiki markup language The system is demonstrated on several but rather XHTML and a WYSIWYG data/wiki: DurmWiki with cultural interface based on the Kupu editor. heritage data where the system extract It relies on the CORESE semantic terms for an index, ReqWiki for qual- search engine. ity assurance in software requirements [377] Hendry Muljadi, Hideaki Takeda, Jiro Araki, specifications on a wiki and GeneWiki Shoko Kawamoto, Satoshi Kobayashi, Yoko with extraction of biological terms. Mizuta, Sven Minory Demiya, Satoshi [371] Tim Berners-Lee, , and Ora Las- Suzuki, Asanoby Kitamoto, Yasuyuki Shirai, sila. The Semantic Web. Scientific American, Nobuyuki Ichiyoshi, Takehiko Ito, Takashi Abe 284(5):28–37, May 2001. an Takashi Gojobori, Hideaki Sugawara, Satoru Miyazaki, and Asao Fujiyama. Samantic me- [372] Roberto Tazzoli, Paolo Castagna, and Ste- diawiki: a user-oriented system for integrated fano Emilio Campanini. Towards a semantic wiki content and metadata management system. In wiki web. Poster at the International Semantic IADIS International Conference on WWW/In- Web Conference, November 2004. ternet, pages 261–264, 2005.

Annotation: Describes the semantic Annotation: Suggest a semantic wiki wiki called PlatypusWiki that is im- based on the MediaWiki software. plemented in Java. [378] Max V¨olkel, Markus Kr¨otzsch, Denny Vrande- cic, Heiko Haller, and Rudi Studer. Seman- [373] Eyal Oren. Semperwiki: a semantic . tic wikipedia. In Proceedings of the 15th inter- In Stefan Decker, Jack Park, Dennis Quan, and national conference on World Wide Web, pages Leo Sauermann, editors, Proceedings of the ISWC 585–594, New York, NY, USA, 2006. ACM. 2005 Workshop on The Semantic Desktop - Next Generation Information Management & Collabo- [379] Markus Kr¨otzsch, Denny Vrandeˇci´c, and Max ration Infrastructure. Galway, Ireland, November V¨olkel. Semantic MediaWiki. In Isabel Cruz, Ste- 6, 2005, volume 175 of CEUR Workshop Proceed- fan Decker, Dean Allemang, Chris Preist, Daniel ings, November 2005. Schwabe, Peter Mika, Mike Uschold, and Lora Aroyo, editors, The Semantic Web - ISWC 2006, Annotation: Describes the Semper- volume 4273 of Lecture Notes in Computer Sci- Wiki semantic personal wiki for the ence, pages 935–942, Berlin/Heidelberg, 2006. GNOME desktop environment. Springer.

81 [380] David E. Millard, Christopher P. Bailey, Philip Annotation: Describes a student Boulain, Swapna Chennupati, Hugh C. Davis, project that built a question answer- Yvonne Howard, and Gary Wills. Semantics on ing system, http://askplatyp.us, with demand: Can a semantic wiki replace a knowl- the use of natural language processing edge base? New Review of Hypermedia and Mul- and Wikidata data. timedia, 14(1):95–120, January 2008. [389] Carrie Arnold, Todd Fleming, David Largent, [381] Jie Bao, Li Ding, and Deborah L. McGuinness. and Chris L¨uer. DynaTable: a wiki extension Semantic history: towards modeling and publish- for structured data. In Proceedings of the 5th In- ing changes of online semantic data. In John Bres- ternational Symposium on Wikis and Open Col- lin, Uldis Bojars, Alexandre Passant, and Sergio laboration, New York, NY, USA, 2009. ACM. Fern´andez,editors, Proceedings of the 2nd Work- Annotation: Short description of the shop on Social Data on the Web (SDoW2009, vol- DynaTable MediaWiki extension for ume 520 of CEUR Workshop Proceedings, 2009. structured data in tables. Annotation: Describes an extension [390] Finn Arup˚ Nielsen. Brede Wiki: Neuroscience to Semantic MediaWiki so the revision data structured in a wiki. In Christoph Lange, history of a MediaWiki is available to Sebastian Schaffert, Hala Skaf-Molli, and Max the semantic query system. V¨olkel, editors, Proceedings of the Fourth Work- shop on Semantic Wikis — The Semantic Wiki [382] Natasha Noy, Alan Rector, Pat Hayes, and Chris Web, volume 464 of CEUR Workshop Proceed- Welty. Defining n-ary relations on the semantic ings, pages 129–133, Aachen, Germany, June web. Technical report, W3C, April 2006. 2009. RWTH Aachen University. [383] Michael Backhaus and Janet Kelso. BOWiki — [391] Finn Arup˚ Nielsen, Matthew J. Kempton, and a collaborative annotation and ontology curation Steven C. R. Williams. Online open neuroimag- framework. In WWW2007, 2007. ing mass meta-analysis. In Alexander Garc´ıa [384] Denny Vrandeˇci´c.Wikidata: a new platform for Castro, Christoph Lange, Frank van Harmelen, collaborative data collection. In Proceedings of and Benjamin Good, editors, Proceedings of the the 21st international conference companion on 2nd Workshop on , volume World Wide Web, pages 1063–1064, New York, 903 of CEUR Workshop Proceedings, pages 35– NY, USA, 2012. Association for Computing Ma- 39, Aachen, Germany, 2012. chinery. [392] Sandro Pinna, Simone Mauri, Paolo Lorrai, [385] Denny Vrandeˇci´cand Markus Kr¨otzsch. Wiki- Michele Marchesi, and Nicola Serra. XPSwiki: data: a free collaborative knowledge base. Com- An agile tool supporting the planning game. munications of the ACM, 2014. In Extreme Programming and Agile Processes in Software Engineering, volume 2675 of Lecture Annotation: A general introduction Notes in Computer Science, page 1014, Berlin, to Wikidata. 2003. Springer. Describes a wiki for [386] Kingsley Idenhen. Preliminary SPARQL end- Annotation: supporting extreme programming. point for Wikidata. Wikidata-l mailing list, The wiki can associate forms with March 2015. wiki pages. [387] Daniel Hern´andez, Aidan Hogan, and Markus [393] Anja Haake, Stephan Lukosch, and Till Kr¨otzsch. Reifying RDF: what works well with Sch¨ummer. Wiki-templates: Adding struc- wikidata?. In Thorsten Liebig and Achille Fok- ture support to wikis on demand. In Proceedings oue, editors, Proceedings of the 11th International of the 2005 international symposium on Wikis, Workshop on Scalable Semantic Web Knowledge pages 41–51, New York, NY, USA, 2005. ACM. Base Systems, volume 1457 of CEUR Workshop Proceedings, pages 32–47. CEUR-WS.org, 2015. Annotation: Describes a wiki with form-based input applied for annota- Annotation: Describes four different tion of scientific papers. The user may methods to convert Wikidata data to modify how the form is presented for a Semantic Web triple data structure display and for editing. XML and wiki and benchmark them with 5 different markup combine in a language to de- SPARQL engines. fine the form. [388] Marc Chevalier, Rapha¨elCharrondi`ere,Quentin [394] Christoph Sauer, Chuck Smith, and Tomas Benz. Cormier, Yassine Hamoudi, Valentin Lorentz, Wikicreole: a common wiki markup. In Pro- and Thomas Pellissier Tanon. Project pens´ees ceedings of the 2007 international symposium on profondes. Master’s thesis, ENS de Lyon, De- Wikis, pages 131–142, New York, NY, USA, 2007. cember 2014. ACM.

82 Annotation: Describes the WikiCre- Annotation: Introduces Open- ole language that is a standardized StreetMap with discussion of among wiki markup language. other issues the historic development, input and output methods, technical [395] Martin Junghaus, Dirk Riehle, Rama Gurram, infrastructure, distributed map- Matthias Kaier, Mario Lopes, and Umit Yalci- tile rendering, social collaboration, nalp. An EBNF grammar for Wiki Creole 1.0. mapping parties, motivations, data ACM SIGWEB Newsletter, Winter 2007. accuracy and participation inequality. [396] Martin Junghans, Dirk Riehle, Rama Gurram, [404] Ruben Sipos, Abhijit Bhole, Blaz Fortuna, Marko Matthias Kaiser, M´arioLopes, and Umit Yalci- Grobelnik, and Dunja Mladenic. Demo: His- nalp. A grammar for standardized wiki markup. toryViz — visualizing events and relations ex- In Ademar Aguiar and Mark Bernstein, editors, tracted from Wikipedia. In Lora Aroyo, Eyal Proceedings of the 2008 International Symposium Oren, Paolo Traverso, Fabio Ciravegna, Philipp on Wikis, 2008, Porto, Portugal, September 8– Cimiano, Tom Heath, Eero Hyv¨onen, Riichiro 10, 2008. ACM, 2008. Mizoguchi, Marta Sabou, and Elena Simperl, ed- [397] Gabriel Wicke and Subramanya Sastry. Parsoid: itors, The Semantic Web: Research and Appli- How Wikipedia catches up with the web. Wiki- cations: 6th European Semantic Web Confer- media blog, March 2013. ence, ESWC 2009 Heraklion, Crete, Greece, May 31–June 4, 2009 Proceedings, volume 5554 of Lec- Annotation: A blog post on the ture Notes in Computer Science, pages 903–907, Wikimedia Foundation Parsoid Berlin/Heidelberg, May 2009. Springer. project that creates a HTML5 representation of MediaWiki wikitext. [405] Abhijit Bhole, Blaz Fortuna, Marko Grobelnik, and Dunja Mladenic. Extracting named entities [398] . Revolting peasants force and relating them over time based on Wikipedia. Wikipedia to cut’n’paste Visual Editor into the Informatica (Slovenia), 31(4):463–468, 2007. bin. , September 2013. [406] Navino Evans. Database system and method. Annotation: News article on the con- United States Patent Application, February 2014. troversy surrounding the VisualEditor US20140046948 A1. on Wikipedia. Annotation: Patent on some aspects [399] Michael Hart. The history and philosophy of of histropedia. Project Gutenberg, 1992. [407] Juhani Eronen and Juha R¨oning. Graphingwiki - [400] Lars Aronsson. Wikisource. Runeberg mailing a semantic wiki extension for visualising and in- list, January 2008. fering protocol dependency. In Max V¨olkel and Sebastian Schaffert, editors, Proceedings of the Annotation: Swedish introduction to First Workshop on Semantic Wikis - From Wiki Wikisource and its inspiration from to Semantics co-located with the ESWC2006, Runeberg. Workshop on Semantic Wikis. ESWC2006, June [401] Imene Bensalem, Salim Chikhi, and Paolo Rosso. 2006. Building Arabic corpora from Wikisource. In [408] Emden R. Gansner and Stephen C. North. An 2013 ACS International Conference on Computer open graph visualization system and its applica- Systems and Applications (AICCSA). IEEE, tions to software engineering. Software — Prac- 2013. tice and Experience, 30(11):1203–1234, 2000. Annotation: Describes a system for [409] Frank Dengler, Steffen Lamparter, Mark Hefke, collecting texts from the Arabic Wiki- and Andreas Abecker. Collaborative process source to form a corpus that can be development using Semantic MediaWiki. In used in a application for Knut Hinkelmann and Holger Wache, editors, plagiarism detection. WM2009: 5th Conference of Professional Knowl- edge Management, volume P-145 of Lecture Notes [402] Michael F. Goodchild. Citizens as sensors: Web in Informatics, pages 97–107, Solothurn, Switzer- 2.0 and the volunteering of geographic informa- land, March 2009. Bonner K¨ollenVerlag. tion. GeoFocus, (7):8–10, July 2007. [410] Jacek Jankowski and Sebastian Ryszard Kruk. Annotation: Editorial on volun- 2LIP: The step towards the Web3D. In Proceed- teered geographic information systems ings of the 17th International World Wide Web [403] Mordechai Haklay and Patrick Weber. Open- Conference (WWW2008). April 21-25, 2008. StreetMap: User-generated street maps. Perva- Beijing, China, pages 1137–1138, April 2008. sive computing, pages 12–18, October-December [411] Jacek Jankowski. Copernicus: 3D Wikipedia. In 2008. International Conference on Computer Graphics

83 and Interactive Techniques. ACM, 2008. Session: [420] Clemens H. Cap. Towards content neutrality in Design. Article No. 30. wiki systems. Future Internet, 4(4):1086–1104, 2012. Annotation: Description of ‘Coper- nicus’: A encyclopedia/browser that [421] Victor Grishchenko. Deep hypertext with em- combines Wikipedia article reading bedded revision control implemented in regular with a 3-dimensional viewer in a two- expressions. In Proceedings of the 6th Interna- layer interface paradigm. The article is tional Symposium on Wikis and Open Collabora- presented as a transparent foreground, tion, page 3, New York, NY, USA, 2010. ACM. the 3D model in the background. [422] Craig Anslow and Dirk Riehle. Lightweight end- user programming with wikis. In Wikis for Soft- [412] Jacek Jankowski. Copernicus adding the third di- ware Engineering Workshop, 2007. mension to Wikipedia. In Wikimania. Wikimedia, July 2008. [423] Alexandros Kanterakis. PyPedia: A Python de- velopment environment on a wiki. In EuroPython, [413] Joseph Corneli. GravPad. In Proceedings of the 2012. 6th International Symposium on Wikis and Open [424] Darren W. Logan, Massimo Sandal, Paul P. Collaboration, pages 30–31. ACM, 2010. Gardner, Magnus Manske, and Alex Bateman. Annotation: Fairly briefly describes Ten simple rules for editing Wikipedia. PLoS EtherPad with wiki and other exten- Computational Biology, 6(9):e1000941, 2010. sions. Annotation: Guidelines for scientists [414] Priya Ganapati. $20 Wikipedia reader uses 8-bit on how to write on Wikipedia. computing power. Wired Gadget Lab, July 2010. [425] Erik Moeller. Wikipedia scholarly survey results. [415] Jon Brodkin. All of Wikipedia can be installed Technical report, Wikimedia Foundation, March to your desktop in just 30 hours. , 2009. November 2013. Annotation: Report on a survey on Annotation: News article on the science and Wikipedia among 1743 XOWA application for automating self-selected respondents. downloading and offline displaying of [426] Patricia L. Dooley. Wikipedia and the two-faced Wikipedias and its sister project. professoriate. In Proceedings of the 6th Interna- tional Symposium on Wikis and Open Collabora- [416] Nina Hansen, Namkje Koudenburg, Rena Hierse- tion, New York, NY, USA, 2010. ACM. mann, Peter J. Tellegen, M´artonKocsev, and Tom Postmes. Laptop usage affects abstract rea- Annotation: Survey among univer- soning of children in the developing world. Com- sity factulty for use of Wikipedia and puter & Education, 59(3):989–1000, November attitude towards Wikipedia’s credibil- 2012. ity, as well as a content analysis of the use of Wikipedia in research papers Annotation: Field experiment on the effect of One Laptop Per Child deploy- [427] Alex Bateman and Darren W. Logan. Time to un- ment and children school engagement derpin Wikipedia wisdom. Nature, 468:765, De- and performance. Note the word grade cember 2010. is quite confusingly used in two senses. Annotation: Short comment that en- [417] Klint Finley. Wiki inventor sticks a fork in his courages scientists to contribute to baby. Wired, April 2012. Wikipedia. [428] Alexander L. Bond. Why ornithologists should Annotation: News article about embrace and contribute to Wikipedia. IBIS, 2011. Ward Cunningham’s Smallest Feder- ated Wiki. Annotation: Short article encour- aging ornithologists to contributed [418] Ward Cunningham. Folk memory: A minimal- to Wikipedia. Professors could also ist architecture for adaptive federation of object ask students to create or improve servers, June 1997. Wikipedia taxonomic articles. [419] G´erˆome Canals, Pascal Molli, Julien Maire, [429] James Heilman. Why we should all edit St´ephaneLauri`ere, Esther Pacitti, and Mounir Wikipedia. University of British Columbia Med- Tlili. XWiki Concerto: A P2P wiki system sup- ical Journal, 3(1):32–33, September 2011. porting disconnected work. In Yuhua Luo, editor, Cooperative Design, Visualization, and Engineer- Annotation: Short article encourag- ing, Lecture Notes in Computer Science, pages ing healthcare professionals to get in- 98–106. Springer, 2008. volved in Wikipedia contribution.

84 [430] Manu Mathew, Anna Joseph, James Heilman, Annotation: Middlebury College his- and Prathap Tharyan. Cochrane and Wikipedia: tory department has banned students the collaborative potential for a quantum leap from using Wikipedia as a source in the dissemination and uptake of trusted - [437] Cullen J. Chandler and Alison S. Gregory. Sleep- dence. The Cochrane database of systematic re- ing with the enemy: Wikipedia in the college views, 10:ED000069, 2013. classroom. The History Teacher, 43(2):247–257, Annotation: Short article on the col- February 2010. laboration between Wikipedia editors [438] Alistair Coleman. Students ‘should use and the Cochrane Collaboration. Wikipedia’. BBC NEWS, December 2007. [431] Sook Lim. How and why do college students [439] Lisa Spiro. Is Wikipedia becoming a respectable use Wikipedia. Journal of the American So- academic source?. Digital Scholarship in the Hu- ciety for Information Science and Technology, manties, September 2008. 50(11):2189–2202, November 2009. Annotation: Blog article reporting Annotation: Examines how college an analysis of citations from academic students use (i.e., read) Wikipedia. journals in the humanities domain area to Wikipedia. [432] Kristina Fiore. APA: Med students cram for ex- ams with Wikipedia. MedPage Today, May 2011. [440] Alireza Noruzi. Wikipedia popularity from a cita- tion analysis point of view. Webology, 6(2), June Annotation: News article reporting 2009. on a study by M. Namdari et al., ‘Is Wikipedia taking over textbooks in Annotation: Analysis of citation medical student education?’ that in- from scientific journals to Wikipedia vestigated which information sources using ISI Web of Science. medical students uses when taking an [441] Bradley Brazzeal. Citations to Wikipedia in psychiatry exam. Question books, Up- chemistry journals: A preliminary study. Is- to-Date and Wikipedia were used the sues in Science and Technology Librarianship, most. Fall 2011. [433] Jeff Maehre. What it means to ban Wikipedia: an Annotation: Analysis of citations exploration of the pedagogical principles at stake. in chemistry journals that goes to College Teaching, 57(4):229–236, 2009. Wikipedia. 370 articles were found and the citations were grouped into cat- Annotation: Discusses how students egories. 33 of the citations cites nu- could use Wikipedia. merical values of physical and chem- [434] Andrea Forte and Amy Bruckman. Writing, cit- ical properties corresponding to 9 per- ing, and participatory media: wikis as learning cent. environments in the high school classroom. Inter- [442] Declan Butler. Publish in Wikipedia or perish. of Learning and Media, 1:23–44, Nature News, December 2008. Fall 2009. Annotation: Article describing a new Annotation: Details a project were approach taken by scientific journal High School students wrote science RNA Biology where author are re- wiki articles. quired to also submit a Wikipedia ar- [435] Neil L. Waters. Why you can’t cite Wikipedia in ticle. my class. Communications of the ACM, 50(9):15– [443] Shoshana J. Wodak, Daniel Mietchen, Andrew M. 17, September 2007. Collings, Robert B. Russell, and Philip E. Bourne. Topic pages: PLoS Computational Bi- Annotation: Historian in Japanese history from Middlebury College de- ology meets Wikipedia. PLoS Computational Bi- scribes the students of his using er- ology, 8(3):e1002446, 2012. roneous information on Wikipedia, Annotation: Presents the concept of the Wikipedia policy subsequently ”Topic Pages” in the scientific journal adopted by his department and the PLoS Computational Biology. It is ed- media coverage. He also makes general ucational academic articles published remarks on authority, accountability in the journal that are meant to also and validity. be publised on Wikipedia. [436] Noam Cohen. A history department bans citing [444] Spencer Bliven and Andreas Prli´c. Circular per- Wikipedia as a research source. The New York mutation in proteins. PLoS Computational Biol- Times, February 2007. ogy, 8(3):e1002445, March 2012.

85 Annotation: The first so-called topic Annotation: Description of SNPe- page of PLoS Computational Biology. dia, which is a wiki based on Seman- tic MediaWiki for recording of data [445] K. Wang. Gene-function wiki would let biologists related to single nucleotide polymor- pool worldwide resources. Nature, 439(7076):534, phisms (SNPs). In August 2011 the February 2006. wiki had information about approxi- [446] Jim Giles. Key biology databases go wiki. Nature, mately 25.000 SNPs. 445(7129):691, February 2007. [454] Alexander R. Pico, Thomas Kelder, Martijn P. van Iersel, Kristina Hanspers ad Bruce R. Con- Annotation: News report on the “Wiki for Professionals”, — a biology klin, and Chris Evelo. WikiPathways: Pathway wiki. editing for the people. PLoS Biology, 6(7):e184, July 2008. [447] S. L. Salzberg. Genome re-annotation: a wiki Annotation: Description of solution?. Genome Biology, 8(1):102, February the WikiPathways database, 2007. http://www.wikipathways.org/ that [448] James C. Hu, Rodolfo Aramayo, Dan Bolser, is related to GenMAPP pathways Tyrrell Conway, Christine G. Elsik, Michael database and the GenMAPP Pathway Gribskov, Thomas Kelder, Daisuke Kihara, Markup Language (GPML) XML Thomas F. Knight Jr., Alexander R. Pico, Deb- format. The wiki uses extended orah A. Siegele, Barry L. Wanner, and Roy D. MediaWiki software. Welch. The emerging world of wikis. Science, [455] Yoshinobu Igarashi, Alexey Eroshkin, Svetlana 320(5881):1289–1290, June 2008. Gramatikova, Kosi Gramatikoff, Ying Zhang, Jef- frey W. Smith, Andrei L. Osterman, and Adam Annotation: A short letter noting the existence of biological wikis, such Godzik. CutDB: a proteolytic event database. as EcoliWiki, GONUTS, PDBwiki and Nucleic Acids Research, 35:D546–D549, January WikiPathways. 2007. [456] Todd H. Stokes, J. T. Torrance, Henry Li, and [449] Finn Arup˚ Nielsen. Lost in localization: A so- May D. Wang. ArrayWiki: an enabling technol- lution with neuroinformatics 2.0. NeuroImage, ogy for sharing public microarray data reposito- 48(1):11–13, October 2009. ries and meta-analyses. BMC Bioinformatics, 9, supplement 6:S18, 2008. Annotation: Comment on a paper by Derrfuss and Mar suggesting the use of [457] Henning Stehr, Jose M. Duarte, Michael Lappe, Web 2.0 techniques for neuroinformat- Jong Bhak, and Dan M. Bolser. PDBWiki: added ics databasing and presentation of the value through community annotation of the Pro- Brede Wiki. tein Data Bank. Database, 2010:baq009, 2010. [458] Eran Hodis, Jaime Prilusky, Eric Martz, Israel [450] Robert Hoehndorf, Kay Pr¨ufer,Michael Back- Silman, John Moult, and Joel L Sussman. Pro- haus, Heinrich Herre, Janet Kelso, Frank Loebe, teopedia - a scientific ’wiki’ bridging the rift be- and Johann Visagie. A proposal for a gene func- tween three-dimensional structure and function tions wiki. In On the Move to Meaningful Inter- of biomacromolecules. Genome Biology, 9:R121, net Systems 2006: OTM 2006 Workshops, vol- 2008. ume 4277 of Lecture Notes in Computer Sci- ence, pages 669–678, Berlin/Heidelberg, Novem- [459] Robert Hoffman. A wiki for the life sciences where ber 2006. Springer. authorship matters. Nature Genetics, 40:1047– 1051, 2008. [451] Helen Pearson. Online methods share insider [460] Barend Mons, , Christine tricks. Science, 441:678, June 2006. Chichester, Erik van Mulligan, Marc Weeber, Jo- [452] Brandon Keim. WikiMedia. Nature Medicine, han den Dunnen, Gert-Jan van Ommen, Mark 13(3):231–233, February 2007. Musen, Matthew Cockerill, Henning Hermjakob, Albert Mons, Able Packer, Roberto Pacheco, Annotation: An editorial describ- Suzanna Lewis, Alfred Berkeley, William Melton, ing wikis used in medicine and biol- Micholas Barris, Jimmy Wales, Gerard Meijssen, ogy. Wikipedia, Digital Universe, Cit- Erik Moeller, Peter Jan Roes, Katy Borner, and izendium, Ganfyd, AskDrWiki and . Calling on a million minds for OpenWetWare are descriped. community annotation in WikiProteins. Genome Biology, 9:R89, May 2008. [453] Michael Cariaso and Greg Lennon. SNPedia: a wiki supporting personal genome annotation, in- Annotation: Describes an online sys- terpretation and analysis. Nucleic acids research, tem that integrates a number of bioin- 40(Database issue):D1308–D1302, January 2012. formatics databases. The system is

86 based on the OmegaWiki MediaWiki [472] Donald MacLeod. Students marked on writing in extension. Wikipedia. Guardian Unlimited, March 2007. [461] Euan Adie. WikiProteins — a more critical look. [473] Elizabeth M. Nix. Wikipedia: how it works and Nascent, June 2008. how it can work for you. The History Teacher, 43(2):259–264, February 2010. Annotation: A blog post with a critical examination of the then just Annotation: Reports on the experi- launched WikiProteins biology wiki. ence with given Wikipedia writing as- signment in a history course. [462] Emilio J. Rodr´ıguezPosada. Wikipapers, una re- copilaci´oncolaborativa de literatura sobre wikis [474] Associated Press. University class has students usando MediaWiki y su extensi´onsem´antica. In updating Wikipedia. FOXNews.com, November IV Jornadas Predoctorales de la ESI, Spain, De- 2007. cember 2012. Escuela Superior de Ingenier´ıade [475] Elizabeth Ann Pollard. Raising the stakes: writ- C´adiz. ing about witchcraft on Wikipedia. The History Teacher, 42(1):9–24, November 2008. Annotation: Spanish description of the WikiPapers wiki for structured [476] Normann Witzleb. Engaging with the world: stu- annotation of academic papers about dents of comparative law write for Wikipedia. Le- wiki research. gal Education Review, 19(1 & 2):83–97, 2009. [463] Charles Wilson. Up and down states. Scholarpe- Annotation: Reports on a Compar- dia J., 3(6):1410, January 2008. ative Law course where students were to write and review Wikipedia articles. [464] Melissa L. Rethlefsen. Medpedia. J. Med. Libr. Assoc., 97(4):325–326, October 2009. [477] Kristine L. Callis, Lindsey R. Christ, Julian [465] Noam Cohen. A Rorschach cheat sheet on Resasco, David W. Armitage, Jeremy D. Ash, Wikipedia?. The New York Times, June 2009. Timothy T. Caughlin, Sharon F. Clemmensen, Stella M. Copeland, Timothy J. Fullman, Ryan L. [466] Michelle Paulson. Two German courts rule in Lynch, Charley Olson, Raya A. Pruner, Er- favor of free knowledge movement. Wikimedia nane H. M. Vieira-Neto, Raneve West-Singh, and blog, December 2012. Emilio M. Bruna. Improving Wikipedia: educa- tional opportunity and professional responsibil- Annotation: Blog post on German court cases involving Wikipedia infor- ity. Trends in Ecology & Evolution, 24(4):177– mation and right of personality. 179, April 2009. [467] Jenny Kleeman. Wikipedia ban for disruptive Annotation: Short report on how professor. The Observer, December 2007. participants in a graduate seminar wrote Wikipedia articles [468] Eugenie Samuel Reich. Online reputations: best face forward. Nature, 473:138–139, May 2011. [478] Klaus Wannemacher. Experiences and perspec- tives of Wikipedia use in higher education. In- Annotation: Describes reputation ternational Journal of Management in Education, management among scientists 5(1):79–92, 2011. [469] Nature. Survey results: Best face forward. Nature Annotation: A review of students as- Web site, May 2011. signment in Wikipedia contribution as they were listed on Wikipedia. Annotation: Statistics from poll on online by re- [479] Joseph Bergin. Teaching on the wiki web. In searchers Proceedings of the 7th annual conference on In- novation and technology in computer science ed- [470] Kelly Chandler-Olcott. A tale of two tasks: edit- ucation, page 195. ACM, 2002. ing in the era of digital literacies. Journal of Ado- lescent & Adult Literacy, 53(1):71–74, September Annotation: A short description of 2009. the use of a wiki for course communi- cation. Annotation: Short article arguing for Wikipedia as an opportunity for teach- [480] Norm Friesen and Janet Hopkins. Wikiversity; or ing students about collaborative writ- education meets the free culture movement: An ing in digital environments. etnographic investigation. First Monday, 13(10), October 2008. [471] Piotr Konieczny. Wikis and Wikipedia as a teach- ing tool. International Journal of Instructional Annotation: Describes the experi- Technology & Distance Learning, 4(1):15–34, Jan- ences from giving a course through uary 2007. Wikiversity.

87 [481] John L. Hilton III and David A. . A sus- Annotation: Describes a machine tainable future for open textbooks? the flat world learning-based system for predicting knowledge story. First Monday, 15(8), August future number of edits of Wikipedia 2010. contributors

[482] Suthiporn Sajjapanroj, Curtis J. Bonk, [489] Yutaku Yoshida and Hayato Ohwada. Wikipedia Mimi Miyoung Lee, and Meng-Fen Grace edit number prediction from the past edit record Lin. A window to Wikibookians: Surveying their based on auto-supervised learning. In 2012 Inter- statuses, successes, satisfactions, and sociocul- national Conference on Systems and Informatics tural experiences. Journal of Interactive Online (ICSAI), pages 2415–2419. IEEE, May 2012. Learning, 7(1):36–58, Spring 2008. Describes a machine Annotation: Survey among 80 con- Annotation: tributors to the Wikibooks Wikimedia learning-based system for predicting project as well as email interview with number of edits a contributor make on 15 individuals selected among the 80. Wikipedia. Among the several results were that [490] Piotr Konieczny. We are drowning in promotional contributors/respondents tended to be artspam. The Signpost, April 2015. young males. [483] Gilad Ravid, Yoram M. Kalman, and Sheizaf Annotation: Op-ed on the problem Rafaeli. Wikibooks in higher education: Empow- of promotional articles on Wikipedia. erment through online distributed collaboration. [491] Gamaliel. Sony emails reveal corporate practices Computers in Human Behavior, 24(5):1913–1928, and undisclosed advocacy editing. The Signpost, 2008. April 2015. Annotation: Reports on the exten- [492] Ted Greenwald. How Jimmy Wales’ Wikipedia sion and update of a textbook with harnessed the web as a force for good. Wired, wiki technology where students partic- March 2013. ipate. The researchers found that for some classes the student participation Annotation: Interview with Jimmy was associated with higher grades on Wales about Wikipedia. average. [493] Mikael H¨aggstr¨om. Average weight of a con- [484] William Shockley. On the statistics of individ- ventional teaspoon made of metal. Wikiversity, ual variations of productivity in research labo- November 2012. ratories. Proceedings of the IRE, 45(3):279–289, March 1957. Annotation: Example on a peer- reviewed work published on a Wikime- Annotation: Study on the productiv- dia Foundation wiki. The small study ity of scientists by counting number weighed 19 different teaspoons and of publications, finding a heavy-tailed found an average weight on 25 grams. log-normal distribution. Explanations for the phenomenon are put forward. [494] Fiona Fidler and Ascelin Gordon. Science is in [485] H. Sackman, W. J. Erikson, and E. E. Grant. Ex- a reproducibility crisis: How do we resolve it?. ploratory experimental studies comparing online Phys.org, September 2013. and offline programming performance. Commu- [495] John P. A. Ioannidis. Why most published nications of the ACM, 11(1):3–11, January 1968. research findings are false. PLoS Medicine, [486] Bill Curtis. Substantiating programmer variabil- 2(8):e124, August 2005. ity. Proceedings for the IEEE, 69(7):846, July 1981. Annotation: Theoretical considera- tions and simulations Annotation: Small study on pro- grammer productivity [496] Steven Goodman and Sander Greenland. Why most published research findings are false: Prob- [487] Alissa Skelton. Wikipedia volunteer editor lems in the analysis. PLoS Medicine, 4(4):e168, reaches 1 million edits. , April 2012. April 2007. Annotation: Report on Justin Annotation: Critique of Ioannidis Knapp that was the first to reach one papers, e.g., the critique notes that million edits on Wikipedia. Ioannidis is wrong when he only uses [488] Zhang. Wikipedia edit number prediction 0.05 for his simulation for his claim based on temporal dynamics only. ArXiv, Octo- and not the actual p-value observed in ber 2011. the studies.

88 [497] Steven Goodman and Sander Greendland. As- Annotation: Danish news article sessing the unreliability of the medical literature: about the result of the 2012 Pro- A response to “why most published research find- gramme for International Student As- ings are false”. Working Paper 135, Johns Hop- sessment (PISA) in Denmark and the kins University, Dept. of Biostatistics, 2007. score of pupils in regard to whether they have access to tablet computers Annotation: This is the indepth cri- or not. tique of Ioannidis paper summarized on an article in PLoS Medicine. [507] Mark Graham. The problem with Wikidata. The Atlantic, April 2012. [498] John P. A. Ioannidis. Why most published re- search findings are false: Author’s reply to good- Annotation: Article discussing Wiki- man and greenland. PLoS Medicine, 4(6):e215, data and its possible influence on the June 2007. diversity of knowledge represented in Wikipedia. [499] John P. A. Ioannidis. Contradicted and initially stronger effects in highly cited clinical research. [508] Bernard Jacquemin, Aurelien Lauf, C´elinePou- JAMA, 294(2):218–228, July 2005. dat, Martine Hurault-Plantet, and Nicolas Auray. Managing conflicts between users in Wikipedia. Annotation: Study on whether pre- In Dominik Flejter, Slawomir Grzonkowski, vious highly-cited clinical research pa- Tomasz Kaczmarek, Marek Kowalkiewicz, Tadhg pers are replicated. Nagle, and Jonny Parkes, editors, BIS 2008 [500] Xiaoquan Michael Zhang and Feng Zhu. Group Workshop Proceedings, Innsbruck, Austria, 6-7 size and incentives to contribute: a natural exper- May 2008, pages 81–93. Department of Informa- iment at Chinese Wikipedia. American Economic tion Systems, Pozna´nUniversity of Economics, Review, 101(4):1601–1615, June 2011. 2008. [509] Andreas Br¨andle.Too many cooks don’t spoil the Annotation: A study of what effect the censoring of Wikipedia in China broth. In Proceedings of Wikimania 2005, 2005. had on the development of the Chinese [510] Sander Spek, Eric Postma, and H. Jaap van den Wikipedia. Herik. Wikipedia: organisation from a bottom-up approach. From workshop Research in Wikipedia, [501] Nima Nazeri and Collin Anderson. Citation fil- on the Wikisym 2006, December 2006. tered: Iran’s censorship of Wikipedia. Annenberg [511] Andreas Neus. Managing information quality in School for Communication at University of Penn- virtual communities of practice. In E. Pierce and sylvania, 2013. R. Katz-Haas, editors, Proceedings of the 6th In- Annotation: Report of an analysis ternational Conference on Information Quality at of the Persian Wikipedia articles that MIT, Boston, MA, 2001. Sloan School of Manage- Iranian authorities block. ment. [502] Noam Cohen. Microsoft Encarta dies after long [512] Jenneke Fokker, Johan Pouwelse, and Wray battle with Wikipedia. New York Times, March Buntine. Tag-based navigation for peer-to-peer 2009. Wikipedia. Collaborative Web Tagging Work- shop, WWW2006, 2006. [503] Shane Greenstein. The reference wars: Ency- clopædia Britannica’s decline and encarta’s emer- [513] Nikolaos Th. Korfiatis, Marios Poulos, and gence. Strategic Management Journal, 2016. George Bokos. Evaluating authoritative sources using social networks: an insight from Wikipedia. [504] Mark Landler. Slow-to-adapt Encyclopaedia Bri- Online Information Review, 30(3):252–262, 2006. tannica is for sale. New York Times, May 1995. [514] Ulrike Pfeil, Panayiotis Zaphiris, and Chee Siang Annotation: News article on the Ang. Cultural differences in collaborative author- troubling economy of Encyclopaedia ing of Wikipedia. Journal of Computer-Mediated Britannica. Communication, 12(1):88–113, October 2006. [505] Dale Sheppard. Reading with — the differ- [515] Klaus Stein and Claudia Hess. Does it matter ence makes a difference. Education Today, pages who contributes: a study on featured articles in 12–15, Term 3 2011. the German Wikipedia. In Proceedings of the 18th Conference on Hypertext and Hypermedia, pages Annotation: Study on reading com- 171–174, New York, NY, USA, 2007. ACM. prehension with 43 Year 6 pupils [516] Aniket Kittur, Ed Chi, and Todd Mytkowicz where half was reading on texts on Bryan A. Pendleton, Bongwon Suh. Power of the iPads. few vs. wisdom of the crowd: Wikipedia and the [506] Maria Becher Trier. Pisa: Adgang til rise of the bourgeoisie. In Computer/Human In- tablets gør ikke nødvendigvis elever d˚arligere. teraction 2007. ACM SIGCHI, 2007. This is not Folkeskolen.dk, December 2012. found in the CHI 2007 proceedings.

89 [517] Fernanda B. Vi´egas, Martin Wattenberg, and have higher quality than anonymous Matthew M. McKeon. The hidden order of users. Each user is measured wrt. re- Wikipedia. In Lecture Notes in Computer Sci- tention of their added text, while not ence, volume 4564, pages 445–454, Berlin/Heidel- considering, e.g., edit wars. They find berg, 2007. Springer. that the retention is higher for anony- [518] Niels Mølbjerg Lund Pedersen and Anders Due. mous users. Wikipedia — viden som social handlen. Mas- [526] Deborah L. McGuinness, Honglei Zeng, ter’s thesis, Department of Media, Cognition Paulo Pinheiro da Silva, Li Ding, Dhyanesh and Communication, University of , Narayanan, and Mayukh Bhaowal. Investiga- Copenhagen, Denmark, August 2006. tions into trust for collaborative information [519] Anders J. Jørgensen. Den epistemiske hen- repositories: A Wikipedia case study. In Pro- sigtsmæssighed af ˚abnekontra lukkede vidensk- ceedings of the Workshop on Models of Trust abende samarbejdsformer — med speciel hen- for the Web, Edinburgh, United Kingdom, May synstagen til peer-reviews og internetfænomenet 2006. Wikipedia, January 2007. [527] Brian Bergstein. New tool measures Wikipedia [520] Maria Ruiz-Casado, Enrique Alfonseca, and entries. USATODAY.com, September 2007. Pablo Castells. Automatic extraction of seman- [528] Thomas Rune Korsgaard. Improving trust in the tic relationships for WordNet by means of pat- Wikipedia. Master’s thesis, Technical University tern learning from Wikipedia. In Natural Lan- of Denmark, Kongens Lyngby, Denmark, 2007. guage Processing and Information Systems, vol- [529] Tor Nørretranders. Det generøse menneske: En ume 3513 of Lecture Notes in Computer Sci- naturhistorie om at umage giver mage. Peo- ence, pages 67–79, Berlin/Heidelberg, May 2005. ple’sPress, 2006. Springer. Annotation: A book dubbing ‘Homo [521] Maria Ruiz-Casado, Enrique Alfonseca, and generosus’ and discussing issues such Pablo Castells. Automatic assignment of as altruism, gift economy, generosity, Wikipedia encyclopedic entries to WordNet reciprocity, cooperation, moral, costly synsets. In Advances in Web Intelligence, volume signals, the handicap principle, sexual 3528 of Lecture Notes in Computer Science, pages selection and economic games. 380–386, Berlin/Heidelberg, 2005. Springer. [530] Josef Urban, Jesse Alama, Piotr Rudnicki, and [522] Maria Ruiz-Casado, Enrique Alfonseca, and Herman Geuvers. A wiki for Mizar: Motiva- Pablo Castells. Automatising the learning of lex- tion, considerations, and initial prototype. In ical patterns: An application to the enrichment Serge Autexier, Jacques Calmet, David Dela- of WordNet by extracting semantic relationships haye, Patrick Ion, Laurence Rideau, Renaud Ri- from Wikipedia. Data & Knowledge Engineering, oboo, and Alan Sexton, editors, Intelligent Com- 61(3):484–499, June 2007. puter Mathematics, volume 6167 of Lecture Notes [523] Evgeniy Gabrilovich and Shaul Markovitch. in Computer Science, pages 455–469, Springer Overcoming the brittleness bottleneck using Berlin / Heidelberg, 2010. Springer. Wikipedia: Enhancing text categorization with [531] Philip Ball. The more, the wikier. encyclopedic knowledge. In Proceedings of the [email protected], February 2007. Twenty-First National Conference on Artificial [532] Jo . Verden i følge Wikipedia forandrer sig Intelligence, pages 1301–1306, Menlo Park, Cali- fra dag til dag. Bibliotekspressen, 3:10–14, 2005. fornia, 2006. AAAI Press. [533] Gregory Druck, Gerome Miklau, and Andrew Mc- [524] Evgeniy Gabrilovich and Shaul Markovitch. Callum. Learning to predict the quality of contri- Computing semantic relatedness using butions to wikipedia. In Proceedings of the AAAI Wikipedia-based explicit sematic analysis. Workshop on Wikipedia and Artificial Intelligence In Proceedings of The Twentieth International (WIKIAI 08, pages 7–12. Association for the Ad- Joint Conference for Artificial Intelligence, pages vancement of Artificial Intelligence, 2008. 1606–1611, 2007. [534] T. Hassine. The dynamics of NPOV disputes. In [525] Denise Anthony, Sean W. Smith, and Tim Proceedings of Wikimania 2005—The First Inter- Williamson. The quality of open source produc- national Wikimedia Conference, 2005. tion: Zealots and good samaritans in the case [535] Dennis Hoppe. Automatische erkennung von of Wikipedia. Technical Report TR2007-606, De- bearbeitungskonflikten in Wikipedia. Ab- partment of Computer Science, Dartmouth Col- schlussarbeit, Baus-Universit¨aWeimar, Weimar, lege, Hanover, New Hampshire, April 2007. February 2008. Annotation: An investigation of [536] Jonathan Isbell and Mark H. Butler. Extracting 7058 contributors on the French and and re-using structured data from wikis. Techni- Dutch Wikipedias, to answer whether cal Report HPL-2007-182, Digital Media Systems contributions from registered users Laboratory, Bristol, November 2007.

90 [537] Marek Meyer, Christoph Rensing, and Ralf Stein- metz. Categorizing learning objects based on Wikipedia as substitute corpus. In First Inter- national Workshop on Learning Object Discovery & Exchange (LODE’07). FIRE/LRE, 2007. [538] A. J. Reinoso, Felipe Ortega, J. M. Gonzalez- Barahona, and G. Robles. A quantitative ap- proach to the use of the Wikipedia. In IEEE Symposium on Computers and Communications, 2009. ISCC 2009, pages 56–61, 2009. [539] Mattias Beerman. Collaborative web content management — wiki and beyond. Master’s thesis, Department of Computer and Information Sci- ence, Link¨opingsUniversity, December 2005. [540] Ian J. Deary. Intelligence: a very short introduc- tion. Oxford University Press, Oxford, UK, 2001.

91 A Notes 16. Zhining Chen et al. (2010). Web Video Cat- egorization based on Wikipedia Categories and There are numerous other studies on wikis, Wikipedia Content-Duplicated Open Resources. and related areas not summarized here.508–530 The following are some notes and discovered links 17. Paula Chesley, Bruce Vincent, Li Xu, and Rohini not yet incorporated into the body text: Srihari. 2006. Using verbs and adjectives to auto- matically classify blog sentiment. In Proceedings of AAAI-CAAW-06, the Spring Symposia on Com- A.1 Publications putational Approaches to Analyzing Weblogs. 1. Sisay Adafre et al. (2007). Fact Discovery in 18. Empirical Analysis of User Participation in Online Wikipedia. Communities: the Case of Wikipedia. Giovanni 2. Maik Anderka, Benno Stein (2012). A Breakdown Luca Ciampaglia, Alberto Vancheri of Quality Flaws in Wikipedia, WebQuality. 19. (About gender on Wikipedia). Collier et at. 3. Denise L. Anthony et al. “Reputation and relia- 20. Kino Coursey, Rada Mihalcea (2009). Topic bility in collective goods: the case of the online identification using Wikipedia graph centrality. encyclopedia Wikipedia.” NAACL-Short ’09 Proceedings of Human Lan- 4. David Aumueller (2005). Semantic authoring and guage Technologies: The 2009 Annual Conference retrieval within a Wiki of the North American Chapter of the Association for Computational Linguistics. 5. P. Ayers, ”Researching Wikipedia - current ap- proaches and new directions,” Proceedings of the 21. Cragun, R. (2007). The future of textbooks? Elec- American Society for Information Science and tronic Journal of . Technology, vol. 43, 2006, pp. 1–14. 22. Catalina Danis , David Singer (2008). A wiki in- 6. Ball (2007). The more, the wikier.531 stance in the enterprise: opportunities, concerns 7. Hoda Baytiyeh , Jay Pfaffman (2009). Why be a and reality, Proceedings of the ACM 2008 con- Wikipedian, Proceedings of the 9th international ference on Computer supported cooperative work, conference on Computer supported collaborative November 08-12, 2008, , CA, USA. learning, p.434-443, June 08-13, 2009, Rhodes, 23. DeHoust, C., Mangalath, P., Mingus., B. (2008). Greece Improving search in Wikipedia through quality 8. Robert P. Biuk-Aghai et al. (2009). Wikis as Digi- and concept discovery. Technical Report. tal Ecosystems: An Analysis Based on Authorship 24. Sylvain Dejean, Nicolas Jullien (2014). Big From 9. Robert P. Biuk-Aghai, Cheong-Iao Pang and Yain- the Beginning. Assessing Online Contributors’ Be- Whar Si (2014). Visualizing large-scale human col- havior by Their First Contribution. Analysis of laboration in Wikipedia, Future Generation Com- newcomers. puter Systems, Volume 31, pp. 120-133 25. Gregory Druck and Gerome Miklau and Andrew 10. Alexandre Bouchard, Percy Liang, Thomas Grif- McCallum (2008). Learning to Predict the Quality fiths, and Dan Klein. 2007. A probabilistic ap- of Contributions to Wikipedia533 proach to diachronic phonology. In Proceedings 26. Donghui Feng, Sveva Besana and Remi Zajac. Ac- of the 2007. In Proceedings of EMNLP-CoNLL, quiring High Quality Non-Expert Knowledge from pages 887ˆa@ S896. On-Demand Workforce 11. Jo Brand, Verden i følge Wikipedia forandrer sig fra dag til dag.532 Drawing the attention of librar- 27. Bowei Du, Eric A. Brewer (2008). DTWiki: A ians Disconnection and Intermittency Tolerant Wiki 12. Br¨andle,Andreas (2005): Zu wenige K¨oche verder- 28. F¨orstneret al. Collaborative platforms for stream- ben den Brei. lining workflows in Open Science. 13. Michael Buffa et al. (2008). SweetWiki: A seman- 29. Andrea Forte , Amy Bruckman (2007). Con- tic wiki. Web Semantics: Science, Services and structing text:: Wiki as a toolkit for (collabora- Agents on the World Wide Web Volume 6, Issue tive?) learning, Proceedings of the 2007 interna- 1, February 2008, Pages 84-97. Semantic Web and tional symposium on Wikis, p.31-42, October 21- Web 2.0. This is related to a previous conference 25, 2007, Montreal, , Canada 376 contribution. 30. Andrea Forte, Cliff Lampe (2013), Defining, Un- 14. Moira Burke, Robert Kraut (2008). Mopping up: derstanding and Supporting Open Collaboration: modeling wikipedia promotion decisions Lessons from the Literature 15. Jon Chamberlain, Udo Kruschwitz and Massimo 31. E. Gabrilovich and S. Markovich (Israel Insti- Poesio. Constructing an Anaphorically Annotated tute of Technology): ”Computing Semantic Re- Corpus with Non-Experts: Assessing the Quality latedness using Wikipedia-based Explicit Seman- of Collaborative Annotations tic Analysis”, 2007

92 32. Zeno Gantner and Lars Schmidt-Thieme. Auto- 52. Ironholds (2013). Why are users blocked on matic Content-Based Categorization of Wikipedia Wikipedia? http://blog.ironholds.org/?p=31 Articles 53. Jonathan Isbell, Mark H. Butler (2007). Extract- 33. David Garcia, Pavlin Mavrodiev, Frank ing and Re-using Structured Data from Wikis536 Schweitzer (2013). Social Resilience in On- 54. Jordan, Watters (2009). Retrieval of Single line Communities: The Autopsy of Friendster. Wikipedia Articles While Reading Abstracts. http://arxiv.org/abs/1302.6109 55. Richard Jensen, “Military history on the electronic 34. E Graells-Garrido, M Lalmas, F Menczer (2015). frontier: Wikipedia fights the was of 1812”. Link First Women, Second Sex: Gender Bias in Wikipedia 56. Elisabeth Joyce et al. (2011). Handling flammable materials: Wikipedia biographies of living persons 35. Andrew Gregorowicz and Mark A. Kramer (2006). as contentious objects. Mining a Large-Scale Term-Concept Network from Wikipedia 57. Elisabeth Joyce et al. (2013). Rules and Roles vs. Consensus: Self-governed Mass Collaborative 36. Guth, S. (2007). Wikis in education: Is public bet- Deliberative Bureaucracies ter? Proceedings of the 2007 international sympo- sium on Wikis, 61–68. 58. Fariba Karimi, Ludvig Bohlin, Ann Samoilenko, Martin Rosvall, Andrea Lancichinetti (2015). 37. Ben Hachey et al. (2013). Evaluating Entity Link- Quantifying national information interests using ing with Wikipedia. the activity of Wikipedia editors. 38. Aaron Halfaker, Aniket Kittur, John Riedl (2011). 59. Brian Keegan et al. (2013). Hot off the Wiki: Don’t bite the newbies: how reverts affect the Structures and Dynamics of Wikipedia’s Coverage quantity and quality of Wikipedia work. of Breaking News Events 39. Haigh. Wikipedia as an evidence source for nurs- 60. Maximilian Klein, Piotr Konieczny. Gender gap ing and healthcare students through time and space: a journey through 40. Laura Hale (2013). User:LauraHale/Blocks on En- Wikipedia biographics and the “WIGI” index. glish Wikinews 61. Joachim Kimmerle , Johannes Moskaliuk , Ul- 41. Rainer Hammw¨ohner (Uni Regensburg): rike Cress, Understanding learning: the Wiki way, ”Qualit¨atsaspekte der Wikipedia”, 2007 Proceedings of the 5th International Symposium 42. S. Hansen, S., Berente, N., and K. Lyytinen on Wikis and Open Collaboration, October 25-27, (2007). Wikipedia as rational discourse: An illus- 2009, Orlando, tration of the emancipatory potential of informa- 62. Frank Kleiner, Andreas Abecker, Sven F. tion systems. 40th Annual Hawaii International Brinkmann. WiSyMon: managing systems mon- Conference on System Sciences, 2007. HICSS itoring information in semantic Wikis, Proceed- 2007. ings of the 5th International Symposium on Wikis 43. Rudolf den Hartogh (2015). The future of the past. and Open Collaboration, October 25-27, 2009, Or- lando, Florida 44. T. Hassine (2005). The Dynamics of NPOV dis- putes534 63. Ren´eK¨onig,“Wikipedia. Between lay participa- tion and elite knowledge representation”. 45. A. J. Head and M. B. Eisenberg, 2010, How to- day’s college students use Wikipedia for course- 64. Krahn et al. (2009). Lively Wiki. A development related research environment for creating and sharing active web content. 46. Heilmann et al.: Wikipedia : a key tool for global public health promotion 65. Markus Kr¨otzsch, Denny Vrandecic, Max V¨olkel 47. James M. Heilmann, Andrew G. West (2015). (2005). Wikipedia and the Semantic Web-The Wikipedia and Medicine: Quantifying Readership, Missing Links. Semantic wiki. Editors, and the Significance of Natural Language. 66. S. Kuznetsov (2006). Motivations of contributors 48. Benjamin Mako Hill. Cooperation in parallel : to Wikipedia. a tool for supporting collaborative writing in di- 67. Pierre-Carl Langlais (2013). Is Wikipedia a rele- verged documents. vant model for e-learning? 49. Matthew Honnibal, Joel Nothman and James R. 68. Kangpyo Lee et al. (2008). FolksoViz: A Curran. Evaluating a Statistical CCG Parser on Subsumption-based Folksonomy Visualization Us- Wikipedia ing Wikipedia Texts. 50. Dennis Hoppe (2008). Automatische Erkennung 69. Pentti Launonen, K. C. Kern, Sanna Tiilikainen von Bearbeitungskonflikten in Wikipedia535 (2015). Measuring Creativity of Wikipedia Edi- 51. Gerardo I˜niguez, Taha Yasseri, Kimmo Kaski, tors. J´anos(2014). Kert´esz.Modeling Social Dynamics 70. Yinghao Li et al. (2007). Improving weak ad-hoc in a Collaborative Environment. queries using Wikipedia as external corpus

93 71. David Lindsey. Evalutating quality control of Spanish academic system. Webometrics including Wikipedia’s feature articles. Wikipedia 72. Meyer et al. (2007). Categorizing Learning Ob- 88. Felipe Ortega et al. (2008). On the Inequality of jects Based On Wikipedia as Substitute Corpus.537 Contributions to Wikipedia. To help categorize unlabeled learning objects 89. Ayelet Oz, Legitimacy and efficacy: The blackout 73. Mietchen et al. 2011, Wikis in scholarly publishing of Wikipedia. 74. Rada Mihalcea (University of North Texas): ”Us- 90. Manuel Palomo-Duarte, Juan Manuel Dodero, ing Wikipedia for Automatic Word Sense Disam- Inmaculada Medina-Bulo, Emilio J. Rodr´ıguez- biguation”, 2007 Posada, Iv´anRuiz-Rube (2012). Assessment of 75. Rada Mihalcea and Andras Csomai. Wikify!: collaborative learning experiences by graphical Linking Documents to Encyclopedic Knowledge. analysis of wiki contributions Proceedings of the Sixteenth ACM Conference on 91. Rafaeli and Arial (2008), Online Motivational Fac- Information and Knowledge Management, CIKM tors: Incentives for Participation and Contribution 2007. in Wikipedia 76. N. Miller, ”Wikipedia revisited,” ETC: A Review 92. Rassbach, L., Mingus., B, Blackford, T. (2007). of General Semantics, vol. 64, Apr. 2007, pp. 1 Exploring the feasibility of automatically rat- 47–150. ing online article quality. Technical Report. 77. Milne, Witten. Learning to link with Wikipedia http://grey.colorado.edu/mingus 78. Minol K, Spelsberg G, Schulte E, Morris N. Por- 93. R. D. M. Page, 2010, Wikipedia as an encyclopedia tals, blogs and co.: the role of the Internet as a of life. medium of science communication. 94. K.R. Parker and J.T. Chao, Wiki as a teaching 79. Joseph Morris, Chris L¨uer,DistriWiki: A Dis- tool, Interdisciplinary Journal of Knowledge and tributed Peer-to-Peer Wiki, Proceedings of the Learning Objects 3 (2007), pp. 57–72. 2007 International Symposium on Wikis (Wik- 95. Ulrike Pfeil et al. (2006). Cultural Differences in iSym), Montr´eal,2007, pages 69-74. PDF Collaborative Authoring of Wikipedia514 80. Muchnik L, Itzhack R, Solomon S, Louzoun Y. 96. Giacomo Poderi (2009). Comparing featured ar- Self-emergence of knowledge trees: extraction of ticle groups and revision patterns correlations in the Wikipedia hierarchies. Phys Rev E Stat Non- Wikipedia lin Soft Matter Phys. 2007 Jul;76(1 Pt 2):016106. Epub 2007 Jul 13. 97. Rafaeli, S., Hayat, T., and Ariel, Y. (2005). Wikipedia community: Users’ motivations and 81. Wyl McCully, Cliff Lampe, Chandan Sarkar, Al- knowledge building. Paper presented at the cy- cides Velasquez, and Akshaya Sreevinasan (2011). berculture 3rd global conference, Prague, Czech Online and offline interactions in online communi- Republic, August. ties. In Proceedings of the 7th International Sym- posium on Wikis and Open Collaboration (Wik- 98. Jessica Ram´ırez, Masayuki Asahara, Yuji Mat- iSym ’11). ACM, New York, NY, USA, 39-48. sumoto (2013). Japanese-Spanish Thesaurus Con- Link struction Using English as a Pivot 82. Timme Bisgaard Munk (2009), Why Wikipedia: 99. Gilad Ravid, Yoram M. Kalman, Sheizaf Rafaeli self-efficacy and self-esteem in a knowledge- (2008). Wikibooks in higher education: Empow- political battle for an egalitarian epistemology erment through online distributed collaboration Computers in Human Behavior, Volume 24, Issue 83. Kataro Nakayama, Takahiro Hara, and Shojiro 5, September 2008, Pages 1913-1928 Nishio (University of Osaka): ”Wikipedia Mining for an Association Web Thesaurus Construction”, 100. Colleen A. Reilly (2011), Teaching Wikipedia as a 2007 mirrored technology 84. Emmanuel Navarro, Franck Sajous, Bruno Gaume, 101. Reinoso et al., A quantitative approach to the use Laurent Pr´evot, ShuKai Hsieh, Ivy Kuo, Pierre of the Wikipedia.538 Magistry and Chu-Ren Huang. Wiktionary for 102. Antonio J. Reinoso, Juan Ortega-Valiente, Roc´ıo Natural Language Processing: Methodology and Mun˜noz-Mansilla,Gabriel Pastor (2013). Visitors Limitations and contributors in Wikipedia 85. Oded Nov (2007). What motivates Wikipedi- 103. Christina Sauper and Regina Barzilay (2009). Au- 270 ans? tomatically Generating Wikipedia Articles: A 86. Mathieu O’Neil, Cyberchiefs: Autonomy and Au- Structure-Aware Approach. ACL 2009. thority in Online Tribes. 104. Scheider et al (2010). A qualitative and quantita- 87. Enrique Ordu˜na-Malea, Jos´e-Antonio Ontalba- tive analysis of how Wikipedia talk pages are used Ruip´erez(2013). Selective linking from social plat- (same as “A content analysis: how Wikipedia talk forms to university websites: a case study of the pages are used”)

94 105. Ralph Schoeder, Linnet Taylor (2015). Big data 121. Benjamin Mako Hill (2013). Essays on volunteer and Wikipedia research: social science knowledge mobilization in peer production across disciplinary divides. 122. Timothy Weale, Chris Brew and Eric Fosler- 106. Philipp Singer, Thomas Niebler, Markus Lussier. Using the Wiktionary Graph Structure Strohmaier, Andreas Hotho (2014). Computing for Synonym Detection Semantic Relatedness from Human Navigational 123. C. Wartena, R. Brussee (2008). Topic Detection Paths: A Case Study on Wikipedia. by Clustering Keywords. Database and Expert 107. Schroer, J.; Hertel, G. (2009). Voluntary en- Systems Application, 2008. DEXA ’08. 19th In- gagement in an open web-based encyclopedia: ternational Workshop on Wikipedians and why they do it. Media Psychol. 12: 96–120. 124. Daniel S. Weld et al. Intelligence in Wikipedia. Twenty-Third Conference on Artificial Intelligence 108. Thamar Solorio, Ragib Hasan, Mainul Mizan (AAAI), 2008. (2013) Sockpuppet Detection in Wikipedia: A Corpus of Real-World Deceptive Writing for Link- 125. A. G. West, A. J. Aviv, J. Chang, I. Lee, Mitigat- ing Identities ing spam using spatiotemporal reputation, Tech- 109. Aline Soules (2015). Faculty Perception of nical report UPENN 2010. Wikipedia in the California State University Sys- 126. West and Williamson (2009). Wikipedia friend or tem. foe. 110. A. Souzis (2005). Building a semantic wiki. IEEE 127. J. Willinsky, 2007. “What open access research Intelligent Systems, pp. 87–91, Sep. 2005. can do for Wikipedia,” First Monday, volume 12, 111. Valentin I. Spitkovsky, Angel X. Chang (2012). A number 3. Cross-Lingual Dictionary for English Wikipedia 128. Thomas Wozniak (2012). Zehn Jahre Concepts. Eighth International Conference on Ber¨uhrungs¨angste: Die Geschichtswissenschaften Language Resources and Evaluation (LREC 2012). und die Wikipedia. Eine Be-standsaufnahme. : http://research.google.com/pubs/archive/38098.pdf Zeitschrift f¨urGeschichtswissenschaften 60/3, S. 112. Michael Strube and Simone Ponzetto (EML Re- 247-264. search): ”Creating a Knowledge Base from a Col- 129. Li Xu (2007) Project the wiki way: using wiki laboratively Generated Encyclopedia”, 2007 for computer science course , 113. Sangweon Suh , Harry Halpin , Ewan Klein Journal of Computing Sciences in Colleges, v.22 (2006). Extracting common sense knowledge from n.6, p.109-116, June 2007 Wikipedia 130. Mark Yatskar. For the sake of simplicity: Unsu- 114. Klaus Stein, Claudia Hess (2007). Does It Matter pervised extraction of lexical simplifications from Who Contributes? – A Study on Featured Articles Wikipedia. in the German Wikipedia. HT-07. 131. Ziqi Zhang and Jose Iria. A Novel Approach to 115. Klaus Stein, Claudia Hess (2008). Viele Autoren, Automatic Gazetteer Generation using Wikipedia gute Autoren? Eine Untersuchung ausgezeichneter Artikel in der deutschen Wikipedia. Web 2.0 — 132. Torsten Zesch, Christof Mueller and Iryna Eine empirische Bestandsaufnahme 2008, pp 107- Gurevych Extracting Lexical Semantic Knowl- 129 edge from Wikipedia and Wiktionary Proceed- 116. Olof Sundin. Debating Information Control in ings of the Conference on Language Resources and 297 Web 2.0: The Case of Wikipedia vs. Citizendium. Evaluation (LREC), 2008. http://www.ukp.tu- darmstadt.de/software/jwpl/ http://www.ukp.tu- 117. Edna Tal-Elhasid and Meishar-Tal, 2007 Tal- darmstadt.de/software/jwktl/ Elhasid, E., and Meishar-Tal, H. (2007). Mod- els for activities, collaboration and assessment in 133. Zesch, T.; Mueller, C. and Gurevych, I. Using wiki in academic courses. Paper presented at the Wiktionary for Computing Semantic Relatedness. EDEN 2007 annual conference, , . In Proceedings of AAAI, 2008 118. Robert Tolksdorf and Elena Simperl (2006). To- 134. Zhang and Zhu, Intrinsic Motivation of Open Con- wards Wikis as semantic hypermedia. Semantic tent Contributors: the Case of Wikipedia wiki. 135. Haifeng Zhao, William Kallander, Tometi Gbe- 119. Noriko Tomuro and Andriy Shepitsen. Construc- dema, Henric Johnson, Felix Wu. Read what you tion of Disambiguated Folksonomy Ontologies Us- trust: An open wiki model enhanced by social con- ing Wikipedia text. 120. B. Ulrik, J. J¨urgen (2007). Revision and co- revision in Wikipedia. Pro Learn International A.2 Further notes Workshop Bridging the Gap Between Semantic Web and Web 2.0, 4th European Semantic Web • Aaron Halfaker (2013). Research:Wikipedia arti- Conference. cle creation

95 • Freebase, http://www.freebase.com/ http://www.wikicfp.com/ http://dbpedia.org/sparql • Ontology construction SUMO, OpenCyc, Yago, UMBEL • Google Squared and Google Fusion Tables. • International Music Score Library Project • http://debategraph.org/ • Searching Wikipedia WikiWax “Semantic Wiki Search”. • Wikipedia as a teaching tool workshop.odp • WikiXMLDB http://wikixmldb.dyndns.org is an XML database using Sedna • Visualization researchers have used the many kinds of statistics extracted from Wikipedia in large-scale renderings of network indegrees. http://scimaps.org/maps/wikipedia/20080103/ “An Emergent Mosaic of Wikipedian Activ- ity” http://scimaps.org/maps/wikipedia/ New Scientist, May 19, 2007. • The collaborative debugging tool http://pastebin.com/ with syntax highlight- ning. • http://www.mkbergman.com/?p=417 • WikiMedia Foundation has maintained a com- puter system with fast response time. BSCW was slow • User authentication “real name” legislation in South Korea. Link • Research to the people • Web service for generation of SIOC files for Medi- aWiki page • http://fmtyewtk.blogspot.com/2009/10/mediawiki- git-word-level-blaming-one.html • Survey of tools for collaborative knowledge con- struction and sharing. • ... and wikis and Wikipedia have been the subject of many theses.539 • : contributing to the common good increase the value more than the contribution. Other apparent irrational behavior: voting. Performance on a work sample test is among the best predictor for job performance.540 “people like to see their words in print”196 • Xanadu • Poor Man’s Checkuser • Visualizations: Link • ZooKeys, Species ID wiki • Encyclopedia of Earth, Environmental Health Per- spectives • Useful Chemistry • Redaktionen editorial team

96 B Further links http://figshare.com/articles/Wikipedia_Scholarly_Article_Citations/1299540 https://blog.wikimedia.org/2015/02/03/who-links-to-wikipedia/ http://www.floatingsheep.org/2009/11/mapping-wikipedia.html http://www.floatingsheep.org/2010/11/geographies-of-wikipedia-in-uk.html http://en.wikipedia.org/wiki/Wikipedia:EDITORS#Demographics http://libresoft.es/Members/jfelipe/thesis-wkp-quantanalysis http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Vandalism_studies

NOAM COHEN, Don’t Like Palin’s Wikipedia Story? Change It, http://www.nytimes.com/2008/09/01/technology/01link.html

Wikilink manipulation. http://www.bluehatseo.com/how-to-overthrow-a-wikipedia-result/ http://grey.colorado.edu/mingus/index.php/Wikipedia_Quality_Pageviews_Correlation_Coefficient http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2229842 http://www.sitepoint.com/blogs/2008/09/04/just-how-powerful-is-wikipedia/ http://www.economist.com/search/PrinterFriendly.cfm?story_id=11484062 http://www.longwarjournal.org/threat-matrix/2009/09/nato_commandos_free_nyt_report.php http://wikimania2010.wikimedia.org/wiki/Submissions/Personality_traits_of_Polish_Wikipedia_members Amichai {Hamburger, Lamdan, Madiel and Hayat in 2008 http://www.searchcrystal.com/versions/wikipedia/home.html

‘In case anyone is interested, we tried to statistically map freebase types to dbpedia types, the key ideas are described here: http://domino.watson.ibm.com/library/cyberdig.nsf/papers/4D84639C32795569852574FD005EA539/$File/rc24684. A variant of this paper was just accepted to the International Semantic Web Conference (ISWC).’ http://meta.wikimedia.org/wiki/Wikimedia_Foundation_Research_Goals http://sunsite.rediris.es/mirror/WKP\_research/DE\_logging\_200906..gz

Kevin Clauson receiving hate mail. http://en.wikipedia.org/w/index.php?title=Wikipedia_talk:WikiProject_Medicine&curid=3209315&=255360019&oldid=255355224 http://thecaucus.blogs.nytimes.com/2008/07/19/the-sam-adams-project/

Vandalism? http://en.wikipedia.org/w/index.php?title=Gauss%E2%80%93Newton_algorithm&diff=prev&oldid=317017845 http://www.wikisym.org/ws2009/tiki-index.php?page=Corporate+Wiki+Metrics http://wikimania2007.wikimedia.org/wiki/Proceedings:AD1 http://meta.wikimedia.org/wiki/Edits_by_project_and_country_of_origin http://www.brooklynrail.org/2008/06/express/nobodys-safe-in-cyber-space http://smartech.gatech.edu/bitstream/1853/29767/1/forte_andrea_200908_phd.pdf http://strategy.wikimedia.org/wiki/Former_Contributors_Survey_Results http://wikimania2010.wikimedia.org/wiki/Submissions/The_State_of_Wikimedia_Scholarship:_2009-2010/Suggestions http://mako.cc/copyrighteous/20100627-00 http://meta.wikimedia.org/wiki/Contribution_Taxonomy_Project http://www.cs.washington.edu/homes/travis/reflect/ http://www.zotero.org/groups/wikipedia_research/items http://knol.google.com/k/roderic-d-m-page/linking-ncbi-to-wikipedia-a-wiki-based/16h5bb3g3ntlu/2?collectionId=28qm4w0q65e4w.46# http://pewinternet.org/Reports/2009/8-The-Social-Life-of-Health-Information/01-Summary-of-Findings.aspx?r=1 http://meta.wikimedia.org/wiki/Mind_the_Gap

Suggestbot User:SuggestBot SuggestBot: using intelligent task routing to help people find work in wikipedia wikiFeed http://wikistudy.mathcs.carleton.edu https://meta.wikimedia.org/wiki/Research:CSCW_2015

97 Index Æblerød, 17 Freebase, 41 addiction, 30 gamification, 41 altmetrics, 36, 37 Gardner, Sue, 15 altruism, 29 gender, 10, 12, 16, 20, 22, 23 American Customer Satisfaction Index, 26 Gene Wiki, 25, 40 article trajectory, 17 General Inquirer, 20 associativity, 17 genre, 14 augmented reality, 37 , 24 authorship, 21 GermaNet,9 Gini coefficient, 21 barnstar, 18, 29, 30, 53 Gompertz model, 31 bazaar, 24 Google, 34 benevolent dictator, 24 Google Knol, 32 Bicholim conflict, 17 Google Scholar,2, 25 biclique, 16 Google Trends, 25 bot, 15, 39, 40 Googlepedia, 44 Brede Wiki, 43, 44, 49 Broadwell, Paula, 35 health, 36 Heilman, James, 49 certification, 39 HistoryViz, 44 CheiRank, 37 histropedia, 44 citation,3, 47, 48 hoax,9, 17 ClueBot, 19, 38, 39 HTML5, 43 ClueBot NG, 19, 38 hypertext, 32 clusterball, 16 image retrieval, 33 Cochrane Collaboration, 47 index generation, 41 collaborative writing application,6,7 infobox, 16, 23, 35, 40, 43 comScore, 25 IntelliGenWiki, 41 constitutional monarch, 24 Internet penetration, 52 court case, 14, 15 IRC,4 coverage, 10–13 CoWeb, 50 Javascript, 34 critical mass, 24 cross-cultural study, 16 Kaggle, 53 CrossRef, 13 Kane, Gerald C., 39

DBpedia,1, 37 labor hours, 27 distributed wiki, 45 LaTeX, 52 donation, 28 Layar, 37 Legolas2186, 17 economy, 27, 28 license, 21 edit-a-thon, 27 Linked Data,1 egoboo, 17, 30 Linked Open Data, 42 election, 36 Linus’ Law, 32 Encarta,7,8, 32, 55 Linux,4, 12, 28 Encyclopædia Britannica,7,8, 11–14, 48, 55 List to Wikipedia, 35 Everything2, 14 LOINC, 10

Facebook,3, 20, 26 machine learning,5, 19, 33, 38, 39 findvej.dk, 37 Macropædia, 13 Finkelstein, Seth, 24, 29, 53 Manske, Magnus, 12, 40–42, 56 Firefox, 44 markup, 43 flagged revision, 32 MD5, 39 folksonomy, 34 MediaWiki, 37, 40 formality, 14 Metaphacts, 40 free and open source software, 23, 24 Meyer, Bertrand, 39 free culture,6 Microsoft, 19

98 Mix’n’match, 12 semantic wiki,5, 41, 42 Monkey selfie, 15 sentiment analysis, 17, 20 motivation, 28–30 Slashdot, 26, 32 SNPedia, 49 named entity recognition, 33 social movement,6 natural language processing, 41 sockpuppet, 20, 53 network, 17 Spanish Fork, 27, 31 network analysis, 15–17 SPARQL, 42 network visualization, 16 Special:Nearby, 41 stigmergy, 25 oh shit graph, 31 STiki, 19, 38, 39 OmegaWiki, 49 Summers, Ed,4, 13 One Laptop Per Child, 45, 55 Swartz, Aaron, 21, 32 online community,6 Open Access Reader, 54 Toolserver,3–5, 18, 23, 35 OpenThesaurus,9 transaction cost, 32 OpenWetWare, 49 translation, 34 opportunity cost, 28 trust, 38 Twitter,6, 13, 36 page view, 36 page views,3 up-to-dateness,7,8, 13 PageRank, 15, 16, 37 Usenet, 31 Palin, Sarah, 35 Parsoid, 43 vandalism,9, 17–19, 37–40, 51, 56 PDBWiki, 49, 54 Veropedia, 39 persistent word view, 21, 22 VisualEditor, 28, 43 personality right, 49 PISA, 55 Wales, Jimmy, 12, 21, 24, 47, 54 policy, 53 West, Andrew, 35 popularity, 25, 27 WhoKnows, 37 power law, 15 Wiki ShootMe, 41 prediction, 35, 36 Wiki Trip, 24 Prefuse, 16 Wiki-PR, 28 Primary Sources Tool, 41 WikiBlame, 21 productivity, 53 Wikibooks, 47, 52 PyPedia, 46 Wikibrowser, 45 Python, 46 Wikibu, 39 WikiChanges, 35 qLabel, 34, 56 WikiChecker, 23, 35 quality,6,7 WikiCite,3 query expansion, 33 WikiCreole, 43 question-answering, 42 WikiDashboard, 21 Wikidata,3,6, 12, 13, 34, 35, 37, 38, 40–43, 55, 56 Ramsey, Derek, 39 Wikidata Query editor, 42 Raymond, Eric, 24, 30, 32 Wikidata Query Service, 42 readability,7–9 Wikidata Toolkit,4 Reasonator, 40, 41, 44 Wikiganda, 20 recommender system, 16 WikiGenes, 49 reification, 42 wikiGraph, 45 replacement cost, 27 wikilink, 15, 16 reputation, 29 Wikimedia Commons, 34 review,2, 33 Wikimedia Tool Labs,3,4, 35 revision history, 38 WikiOpener, 40 Rfam database, 18, 40, 49 WikiPathways, 49 RNA Biology, 48 Wikipedia, 19 Wikipedia Beautifier, 44 Scholarpedia, 32, 49 Wikipedia Live Monitor, 35 Semantic MediaWiki, 42, 43, 49 Wikipedia Natural Disaster Monitor, 35 semantic relatedness, 33 Wikipedia Recent Changes Map, 24 semantic stability, 40 WikipediaVision, 24 Semantic Web, 42 WikiPic, 34

99 WikiProtein, 54 WikiProteins, 49 Wikirage, 35 Wikiscanner, 19, 36 Wikisource, 43, 44, 47 WikiStory, 44 Wikistream,4 Wikitravel, 31 Wikitrends, 35 WikiTrivia, 37 WikiTrust, 20, 38, 39 Wikitude, 37 wikitweets, 13 Wikiversity, 51, 52, 54 WikiViz, 16 WikiWord, 34 Wiktionary,9, 33 WiViVi, 24 WordNet, 33, 35 WYSIWYG, 43

XOWA, 45 XWiki, 45

YAGO, 33, 35

Zachte, Erik,5, 24, 31

100