Cultural Diversity of Quality of Information on Wikipedias
Total Page:16
File Type:pdf, Size:1020Kb
Cultural Diversity of Quality of Information on Wikipedias Dariusz Jemielniak New Research on Digital Societies (NeRDS) group, Kozminski University, Jagiellonska, 59, 03-301, Warszawa, Poland. E-mail: [email protected] Maciej Wilamowski Faculty of Economic Sciences, University of Warsaw, Warwaw, Poland. E-mail: [email protected] This article explores the relationship between linguistic of information, are more references always good? How many culture and the preferred standards of presenting infor- photos should illustrate a given topic? What is the optimal mation based on article representation in major Wikipe- length of an article? Should external links be used? These dias. Using primary research analysis of the number of images, references, internal links, external links, words, and other questions are highly relevant to a larger discussion and characters, as well as their proportions in Good and on the quality of knowledge and information. It is also not Featured articles on the eight largest Wikipedias, we clear as to what extent this presented information is culturally discover a high diversity of approaches and format pref- influenced. Is it possible, as we naturally tend to assume, that erences, correlating with culture. We demonstrate that universal knowledge has some global standard of quality or high-quality standards in information presentation are not globally shared and that in many aspects, the lan- that perhaps some languages and related cultures have a pref- guage culture’s influence determines what is perceived erence for visual representation or differing referencing stand- to be proper, desirable, and exemplary for encyclopedic ards? We attempt to uncover this information in this article. entries. As a result, we demonstrate that standards for encyclopedic knowledge are not globally agreed-upon and “objective” but local and very subjective. Problem Statement Based on data from major Wikipedias from different lin- guistic cultures, we want to discover whether the quality of Introduction information (that is, structured, contextualized, categorized As of 2016, there were nearly 2.2 million Wikipedians (as data) has a universally standardized character. Addressing contributors to Wikipedia are often called) worldwide who this problem is important, as that quality of information is had edited at least 10 times1 (the number of recently created often assumed to be neutral to culture and globally shared. accounts is significantly higher, as the English Wikipedia However, we suspect that different language cultures may alone has nearly 29 million registered users).2 This fact makes have different understandings of what constitutes the best Wikipedia, or rather Wikipedias, easily the largest social quality of encyclopedic information. In particular, we specu- movement of humankind—never before have so many people late that the commonly expected reliance on external and collaborated to create a common outcome (Jemielniak, 2014). internal links, the use of images, the proportion of words These Wikipedians want to create any librarian’s dream: and characters to references, and other variables may differ the systematized, free, and high-quality sum of all human between different language cultures, independent of the knowledge. However, it is not entirely clear what is meant by maturity and size of the project. If this is so, different lan- high quality in an encyclopedia. Apart from simple accuracy guage cultures have different standards in terms of what constitutes a good, informative encyclopedic entry. Conse- quences of this phenomenon would be substantial: it would 1See: https://stats.wikimedia.org/EN/TablesWikipediaZZ.htm 2See: https://en.wikipedia.org/wiki/Special:Statistics mean that, for example, simply translating encyclopedias or textbooks perceived as exemplary (and of the highest qual- ity) may not result in what is considered the highest quality Received July 29, 2016; revised January 19, 2017; accepted April 19, in the language and culture into which the given text has 2017 been translated. VC 2017 ASIS&T Published online 7 August 2017 in Wiley Online It is important to observe that, since we study different Library (wileyonlinelibrary.com). DOI: 10.1002/asi.23901 language versions of Wikipedias without a possibility of JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 68(10):2460–2470, 2017 recognizing the actual country of origin of contributors, our procedures for decision-making and decisions about article conclusions are limited to what is commonly shared by spe- standards, notability, etc. As a result, many observations cific language cultures, and not national cultures (in the made only on English Wikipedia are not as global as they cases when more than one nation speaks a given language). are portrayed. It is quite clear, though, that there are signifi- cant cultural differences between different Wikipedias Literature Review (Pfeil, Zaphiris, & Ang, 2006) and that they are related not only to language issues, for example, the ways in which edi- The Wikipedia movement is largely self-organized and tors interact with each other (Hara et al., 2010), but also to relies on an ad-hoc “division of labor, because people gravi- more general issues, for example, accepted information tate to work they enjoy, but little hierarchy” (Ayers, Mat- quality, collaborative innovation networks, etc. (Nemoto & thews, & Yates, 2008, p. 217) and is generally described as Gloor, 2011; Stvilia, Al-Faraj, & Yi, 2009). nonhierarchical (Bruns, 2008). Unlike many other F/L/OSS While there are many studies focusing on different types (Free Libre and Open Source Software) projects (Lakhani & of topics related to the Wikipedia community and technol- Wolf, 2003; Von Hippel & Von Krogh, 2003), participation ogy (Jemielniak, 2016; Muller-Birn,€ Dobusch, & Herbsleb, in Wikipedia does not help anyone improve their professional 2013; Osman, 2013; Yuan, Cosley, Welser, Xia, & Gay, status or career; writing an encyclopedia is not an experience 2009), including its social construction, demographics, and highly valued for any particular profession, and contributions governance (Collier & Bear, 2012; Halfaker, Geiger, Mor- to Wikipedia are also often too compartmentalized to allow gan, & Riedl, 2013; Hill & Shaw, 2013; Jemielniak, 2015), for an easy evaluation of one’s contributions as a whole. as well as the content itself (Halavais & Lackaff, 2008; Rad However, the results of this largely uncoordinated and & Barbosa, 2012; Rad, Makazhanov, Rafiei, & Barbosa, nonprofessional cooperation are amazing not only in terms 2012; Reavley et al., 2012) and its internal rules and con- of quantity but also quality. In a famous 2005 study pub- flicts (Chen, 2010; de Laat, 2012; Geiger, Halfaker, Pin- lished by Nature, Wikipedia was already going head-to-head chuk, & Walling, 2012; Konieczny, 2009), most of them do with the Britannica encyclopedias and lexicons (Giles, not rely on a cross-project analysis of different Wikipedias. 2005) in terms of the number of errors. Although at that Only recently have some authors turned to multicultural time the results were not only surprising but also controver- and multilingual analyses of the Wikipedia phenomena sial, currently the situation has become much more clear. (Jemielniak, 2013; Yasseri, Spoerri, Graham, & Kertesz, Wikipedia has grown to be 50 times larger than Britannica 2014). Yet there is still a significant gap in social research on 3 in terms of word count, and Britannica went out of print in Wikipedia communities that needs to be addressed, as study- 2012. English Wikipedia’s overall linguistic readability ing Wikipedia projects as separate entities, while informative, receives mixed evaluations (Lucassen, Dijkstra, & Schraa- does not allow the drawing of more general conclusions on gen, 2012; Yasseri & Kertesz, 2012), but the quality of national or linguistic preferences and styles of editors and information, even in areas as sophisticated as, for example, readers. This is important because Wikipedia is arguably the mental health, is consistently higher than other professional best embodiment of what people of different language cul- encyclopedic sources (Reavley et al., 2012). It is also much tures perceive as good, well-presented information. By study- better referenced (Rivington, 2007), even if its perceived ing what different Wikipedian communities consider to be credibility is still problematic (Flanagin & Metzger, 2011). the paragon of encyclopedic entries, we delve into the cul- Considering the previously mentioned unique character tural differences in understanding information quality in gen- of Wikipedias, as well as its tremendous success, it is not eral. We already know that there are cultural differences in surprising that Wikipedia has been a topic of many academic the styles of contributions on Wikipedia, correlating with studies (Jemielniak & Aibar, 2016; Konieczny, 2016; Lih, Hofstede’s dimensions (Pfeil et al., 2006). It is also clear that 2009; Reagle, 2010). However, most of these studies thus the standards of quality are likely enacted through collabora- far have focused on the English Wikipedia (Hara, Shachaf, tive communities of practice and newcomer enculturation (Østerlund, Mugar, Jackson, & Crowston, 2016). However, & Hew, 2010), with some notable exceptions (Sundin, the cultural differences in what is considered a well- 2011). Most of the studies addressing the diversity of proj- articulated knowledge presentation are still undiscovered.