The Struggle of Small and Non-Western Editions

Heiko Wiggers Wake Forest University

Abstract The online Wikipedia has become one of the most influ- ential platforms on the World Wide Web and is currently the sixth-most visited website overall. For smaller languages, creating their own Wikipedia edi- tions can constitute a tremendous boost to their general online presence. This paper investigates whether Wikipedia’s internal structure and culture is really inclusive in its treatment and representation of minority, endangered, regional, and non-Western languages. The paper argues that Wikipedia and, indeed, the Internet itself favor Western, mainstream languages and content and thus make it almost impossible for smaller languages to achieve a meaningful online presence.

1 Introduction - Digital Divide

The term "digital divide" dates back to the early days of the Internet in the 1990s and describes the unequal access of different sections of the population to new information and communication technologies (ICT) in international, national, and regional comparisons. This term does not only refer to the acquisition and ownership of new technological devices (e.g. personal computers, laptops, smartphones, etc.), but also to the fact that on the one hand more than half of all people in the world have no access to the Internet and, on the other hand, navigating the Internet (use or handling) poses a significant problem for many people who do have access. From a sociological point of view, researchers (Dudenhöffer & Meyen 2012) worry that information technologies will create a new two-tier society between those who can afford ICT equipment and who have the knowledge to operate these devices and those who do not have the necessary income to acquire such devices, or who are having difficulties handling such technologies. Furthermore, it is feared that existing inequalities, especially in terms of education, income and social skills, are being recreated or will even intensify in the new online world. Considering the rapid rise of the Internet as the largest communication system in human history, it becomes clear that people who cannot participate in this phenomenon are not only marginalized, but also have significantly fewer opportunities and chances than the so-called habitual users of these technologies. Critics point out that the digital divide cannot be substantiated empirically. In particular, they stress that

Wiggers, H. 2018. The Struggle of Small and Non-Western Wikipedia Editions. Proceedings of the 4th Annual Linguistics Conference at UGA, The Linguistics Society at UGA: Athens, GA. 66–86. The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers

problems of use are relatively easy to remedy, and that it is up to the users themselves to gain the necessary knowledge to navigate the virtual world. By international standards, the digital divide is also considered to be predominantly a sociological and demographic problem, with some blatant inequalities between developed and developing countries. For example, current data from 2017 show that only about 10% of all 1.25 billion people living in Africa are Internet users and/or have access to the Internet. In Madagascar, for instance, out of an estimated 25.6 million people, just 1.3 million have access to the Internet, while the number of Internet users in Ethiopia is approximately 16 million, with an estimated 104.5 million inhabitants. The situation is similar in many countries of Southeast Asia.

2 Smaller Languages and the World Wide Web

In addition to sociological problems, there is much debate about whether and to what extent access to and use of the Internet poses a threat to minority, endangered, regional, and non-Western languages (henceforth referred to as MERnW-languages). Linguists (such as Crystal 2002) point to a massive extinction of languages and estimate that approximately half of the estimated 6000-7000 languages currently spoken in the world will be extinct by the end of the 21st century. This process already existed before the spread of the Internet, but it has noticeably accelerated since the turn of the millennium. In linguistic research there are relatively large differences of opinion as to how the digitization of large parts of humanity contributes to the extinction of languages. Many linguists and language activists see the Internet as a chance to revive MERnW-languages or make them more accessible to a wider audience. Many others, however, fear that the increasing interconnectedness of the world only benefits the major dominant languages, such as English, Spanish, German, or French, and that smaller languages inevitably will fall by the wayside. This is particularly true for MERnW-languages whose speakers often have problems with accessing or using the Internet. The figures below show that the digital revolution of recent decades is by no means a reflection of global linguistic diversity:

i. Of the estimated 6000-7000 spoken languages in the world, less than 500 had a digital existence in 2017 (i.e. websites in their languages). ii. Of the approximately 3.9 billion Internet users worldwide in 2017, around 3 billion are speakers of the so-called "top ten languages online". These are: English, Chinese, Spanish, Arabic, Portuguese, Indonesian / Malay, Japanese, Russian, French and German.

67 The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers

iii. This means that the remaining 900,000 million Internet users in 2017 were distributed among the approximately 470-480 remaining languages that are represented online.

In 2007, Cunliffe launched an extensive study to investigate the online presence of smaller languages and came to the following conclusion:

“The linguistic diversity of the world is poorly reflected on the Inter- net. [...] 90% of the world’s languages are simply not represented.” (2007: 139)

The American media, however, seem to view the Internet as a rather positive medium for smaller languages. In fact, in recent years a slew of American media reports appeared, whose headlines alone seem to indicate that internet technology and/or globalization are a cure-all for MERnW-languages, such as: “Globalization helps prevent endangered languages” (Yale Global News, December 2013); “For rare languages, social media provide new hope” (NPR, July 2014); and “Technology to the endangered language rescue!” (Huffington Post, January 2015). This kind of trust in the Internet as a regenerative medium relies on a considerable body of research that views the World Wide Web and especially social media as a significant opportunity for MERnW- languages. In addition, these are not limited to African or threatened languages in South America’s Amazon region, but also extend to Europe’s endangered languages. Dolowy-Rybinska(2013), for example, investigated the use of Kashubian in social media, and came to the conclusion that this language, whose use was prohibited under Poland’s communist rule, benefits enormously from the Internet:

“Speaking most broadly, the rise of the Internet has been very advan- tageous for the Kashubian-speaking community, especially for the young. [...] It [using Kashubian online] has led to an increase in the prestige of the language: if Kashubian can be used online, it cannot be so inferior and unsuitable after all. [...] Young people commu- nicate, exchange remarks Kashubian culture and its function in the modern world, and find other people to whom Kashubian language and culture are also important.” (2013: 127-128)

Susan Wright (2006) examined the use of five other smaller, regional European lan- guages online (Occitan, Piedmontese, Ladin, Sardinian, and Frisian) and concluded likewise that the Internet was generally a positive development for these languages:

68 The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers

“[...] all five languages in the survey are present on the Internet. With- out providing actual figures, which are [...] likely to be misleading and immediately out of date, we can nonetheless report that the Occi- tan researchers found over a thousand sites, the Sardinian and Frisian researchers hundreds, and the Piedmontese and Ladin researchers dozens. The numbers of websites in which the five languages are used is, therefore, not negligible, and their presence in this medium indisputable.” (2006: 192-193).

Despite these generally auspicious results, both authors point out that their research was ultimately inconclusive, as it is impossible to predict whether the Internet will really improve the situation of these languages in the long term. In addition, both researchers emphasize that the digital presence of a MERnW- language is by no means equivalent to a language revival:

“The fact remains that using certain pages in the minority language is unlikely to produce a major linguistic shift among young people; their main language is likely to remain the national language or – in international contacts – English.” (2013: 127).

In general, the influence of the Internet on MERnW-languages outside the U.S., is seen in a much more cautious manner. The most widely respected and most recognized study in the non-English-speaking world on this subject comes from the Hungarian linguist András Kornai and his team from the Budapest Institute of Technology. With meticulous research and the application of mathematical formulas and algorithms, Kornai’s team not only explored the current digital state of MERnW- languages, but also made predictions about their digital future, which are quite sobering. Based on his team’s calculations Kornai predicts that less than three hundred languages will have an online presence by the 21st century:

“With only 250 digital survivors, all others must inevitably drift towards digital heritage status or digital extinction. [...] There could be another 20 spoken languages [...] that may make it, but every one of these will be an uphill battle. For 95% of the world’s languages there is very little hope of crossing the digital divide.” (2013: 10)

Furthermore, Kornai points out that it is very difficult for MERnW-languages to secure a so-called digital ascent, i.e. achieving and maintaining a meaningful online presence, while digital descent, i.e. either never achieving an online presence in the

69 The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers

first place or slipping into digital obscurity, is quite common for these languages. According to Kornai, digital descent frequently ends in digital death, which is largely irreversible:

“We emphasize that this massive die-off is not some future event that could, by some clever politics, be avoided or significantly mitigated – the deed is already done.” (2013: 7)

In addition, Kornai states that the confidence of language activists in so-called "feel- good projects", i.e. Blogs in Basque or apps in the Cherokee language, however welcome they are, will not create a meaningful and long-term digital existence for these languages:

“Heritage projects are haphazard [and] resources are squandered on feel-good revitalization efforts that make no sense in the light of the preexisting functional loss and economic incentives that work against language diversity [...]. For the overwhelming majority of languages, the glorious digital tomorrow will never arrive.” (2013: 10)

3 Wikipedia

There are probably few websites that are as representative of the information age as the collaborative online encyclopedia Wikipedia. The free online encyclopedia is by now the sixth most-visited website in the world. To be included on Wikipedia is a significant achievement for a MERnW-language and can hardly be overestimated, as Kornai points out:

“The need for creating a Wikipedia is quite keenly felt in all digitally ascending languages. [...] Experience shows that Wikipedia is always among the very first active digital language communities, and can be safely used as an early indicator of some language crossing the digital divide. To summarize: No Wikipedia, no ascent. [Italics by Kornai]. (2013: 3)

The “Wikipedia-Status” of a language is determined by several factors, such as number of articles, administrators, edits per article, and active users (i.e. Wikipedia users who have made at least one edit in the last thirty days). Perhaps not surprisingly, Wikipedia itself provides a detailed list of a language’s status that is updated daily.1 1Available at: https://meta.wikimedia.org/wiki/List_of_Wikipedias. All Internet references were accessed in January 2018.

70 The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers

There are presently Wikipedia editions in 296 languages. As expected, English is ranked at number one with 5,548,114 entries (=articles), 1,240 administrators, and 121,876 active users. German is ranked at number four with 2,141,270 entries, 195 administrators, and 18,641 active users. Altogether, there are 13 languages with more than one million entries, most of them world-languages such as English, Spanish, Russian, Japanese, French, Italian, and German. At the other end of the list there are 111 languages with fewer than 10,000 entries, and 46 languages with less than 1,000 entries, many of them Native American languages, such as Inupiaq (260 entries), Cree (125 entries), Choctaw (6 entries), or Muscogee (1 entry). This study investigates the Wikipedia presence of five MERnW-languages from Europe and Africa. From Europe: Irish (endangered, indigenous, and national lan- guage of the ), Piedmontese (a regional and minority language spoken in northern ), and (a regional, indigenous, and endangered language spoken in northern ). From Africa: Yoruba (a Western African language, primarily spoken in Nigeria and Benin), and Zulu (a Bantu language, primarily spoken in South Africa). These five MERnW-languages have all achieved “Wikipedia-Status”, i.e. they each have their own “”. This in itself may be seen as quite a feat considering that approximately 5,500 – 5,700 of the world’s languages are absent from Wikipedia. Yet, all of them occupy lower ranks on Wikipedia in terms of number of articles and administrators (Number 73 and lower). Table one provides a list of rankings on Wikipedia of the five MERnW-languages with English and German as reference languages.

Figure 1 Wikipedia Rankings

Although the number of articles for the MERnW-languages in Table one look for the most part solid and perhaps even impressive, they are by no means a strong

71 The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers

indicator whether these languages can really be regarded as digitally ascending, as Kornai maintains. To this date, there are no studies that investigate the actual status of smaller languages on Wikipedia, and whether a Wikipedia presence can really be considered a significant support for smaller languages’ digital ascent. The number of administrators and active users seem to indicate that a dedicated, small group of speakers/users (less than 100 for each language) are mostly responsible for maintaining the five MERnW-languages’ Wikipedia profile. Compared to English and German, these numbers seem rather negligible. Kornai assumes that a language’s mere presence on Wikipedia seems to be sufficient to gain and secure digital ascent. This paper, however, argues that a presence on Wikipedia, even with a sizeable amount of articles, is no safeguard to really assure digital ascent. In order to investigate this, I launched a Wikipedia search for common items in each MERnW- language and conducted a word count for each entry (if applicable). The study took place between November 2016 and September 2017. English and German served as reference languages.2 The total number of items researched and counted for the study would go beyond the scope of this paper. Therefore, I am presenting here the results for three categories: 1) Ten Common Items, 2) Five World Heritage Sites, and 3) Five Common Geological and Astrological Features. Tables two and three show the results for category one, Ten Common Items.3

Figure 2 Common Terms I. Article Length by Number of Words

2Wikipedia references were not included in the total word count. 3All numbers in the tables provided were recounted in December 2017-January 2018.

72 The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers

Figure 3 Common Terms II. Article Length by Number of Words

Tables two and three illustrate the strong dominance of the English and editions compared to smaller languages. The two non-western Wikipedia editions of Zulu and Yoruba in particular had very few entries for these ten items (none in the case of Zulu), while the two regional European languages Piedmontese and Low German are also considerably lagging behind English and German with no entry showing more than 450 words. Only the shows entries for every item albeit with several almost being miniscule, e.g. five words for “Submarine” and eleven words for “Pyramid”. Tables four and five show the results for categories two and three: Five World Heritage Sites, and Five Common Geological and Astrological Features respectively.

Figure 4 World Heritage Sites. Article Length by Number of Words

73 The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers

Figure 5 Common Geological and Astronomical Feature. Article Length by Number of Words

The results for category two (Table four) almost seem to mirror those for category one with the difference that the Low German and Piedmontese editions show even less entries (only one entry each). The results for category three (Table five), however, are remarkable since they show that not all Wikipedia entries of MERnW-languages need to be subpar. For example, the word count and information for the Piedmontese entry “Stone Age” surpasses that of the German entry. Likewise, the Irish entry for “Mars” is rather impressive. The reasons for these apparent variations are not quite clear. It is possible that these entries are the results of true collaborations between active users of these languages rather than a contribution by a single author. Although the imbalance between the English/German Wikipedia editions and those of the researched MERnW-languages seems to be more than just striking, several Wikipedia scholars (Dalby 2009; Otterbacher 2014) point out that this does not constitute a problem and that, in fact, there is no need for smaller languages to compete with the dominant Wikipedia editions in terms of number of entries. Rather, these scholars argue that the role of smaller is to provide information for region-specific and local items/events that cannot be found in, say, the English, German, or Russian editions:

“Smaller Wikipedias [...] have an important role to play. Events that are close to home are highlighted, and local sources of informa- tion are cited. [...] Smaller Wikipedias play a role by providing the global community with local sources, which might give more detailed information and/or a different perspective.” (Otterbacher 2014: 51)

In theory, this would mean that an important or unique regional event, tradition, food, landscape, or building in northern Germany or Ireland would result in an

74 The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers

in-depth Wikipedia entry in the Low German and Irish editions with information that is either absent from the German and English editions, or with information that would complement the German and English Wikipedias. In order to test this claim, I conducted a search for region-specific terms for the five MERnW-languages. I chose three items for each language with English as reference language. The results are illustrated in Tables six to ten.

Figure 6 MERnW-Languages and Region-specific Terms Piedmontese, ca. 700,000 native or “daily speakers” Article Length by Number of Words

Figure 7 MERnW-Languages and Region-specific Terms Irish (Gaelige), ca. 40,000 - 80,000 native or “daily speakers” Article Length by Number of Words

75 The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers

Figure 8 MERnW-Languages and Region-specific Terms Yoruba (Ede Yoruba), ca. 31 million L1-speakers Article Length by Number of Words

Figure 9 MERnW-Languages and Region-specific Terms Low German (Plattdüütsch), no L1-speakers left, ca. 1.7 million L2-speakers or “daily speakers”. Article Length by Number of Words

76 The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers

Figure 10 MERnW-Languages and Region-specific Terms Zulu, ca. 12 million L1-speakers. Article Length by Number of Words

The results of Tables six-ten demonstrate that the claim that smaller Wikipedias provide important local information only seems to (partially) apply to the Piedmon- tese Wikipedia, and even here the has more information in every instance. Particularly striking is Low German’s non-entry for “Wattenmeer”, the name for northern Germany’s unique and much-cherished North Sea coast. Equally troubling is the fact that the entry for “Swaziland”, arguably one of the most impor- tant and unique features related to the Zulu language, amounts to a mere twenty-four words in the , while the English Wikipedia shows close to 7,000 words for the same entry. In summary, the results of almost all Tables presented above are worrisome. The brevity of many of the articles and the paucity of information may cause more problems than benefits, such as perceptions that the examined MERnW- languages are unrefined, unsophisticated, and second-rate (which, of course, they are not). Section four discusses possible reasons for the extreme discrepancies between established Wikipedias and the MERnW-editions.

4 Interpreting the Results

Any attempt to interpret the results of this study must start with the admission that the voluntary work of the small Wikipedia crews of the five respective MERnW- languages researched in this paper cannot be faulted. Instead, it might be helpful to take a closer look at Wikipedia’s internal structure (i.e. starting a Wikipedia edition in a new language), its westernized, male-dominated culture, and particularly its worldwide editor participation rate.

77 The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers

4.1 Starting a New Wiki

What does it take to start a Wikipedia in a new language on the “Web’s most open and inclusive platform”? Let us suppose a group of speakers of a language not yet represented on Wikipedia would like to start a new edition in their own language. The “instruction manual”, so to speak, can be found online and is provided by Wikipedia. The first step would be to visit the webpage “Help: How to start a new Wikipedia”4 This webpage seems to be only available in English and Korean and greets readers with the not very encouraging words5: “Do you want to start a Wikipedia in a new language? Please realize that it could need a lot of work. It will take some time to become ready to ‘go live’ Further down on this webpage newcomers are instructed to click on the link “How to edit a page” and to translate the instructions on that website into their own language. These instructions consist of very lengthy, highly technical and complex information, such as:

Inline citations are most commonly placed by inserting a reference between ... tags, directly in the text of an article. When one publish changes, that will display in the text as a footnote (e.g. [1][2]), and the source you keyed in will appear on the page in a collated, numbered list corresponding to the footnote numbers in the text, wherever a {{Reflist}} template or tag is present, usually in a section titled "References" or "Notes."6

Technically speaking, the newcomers do not need to be fluent in English in order to translate these information since the other 295 Wikipedia editions all had to translate them into their own languages as well. The newcomers could therefore choose to have the instructions displayed in a different language if they are not comfortable with the English edition. A random sampling whether this is indeed the case, however, shows that many editions have translated these instructions in a considerably shortened, condensed, or incomplete manner. The sample found that complete translations are for the most part only available in the established Wikipedia languages, such as Italian, Swedish, German, etc.7 In other words, in

4Available at: https://meta.wikimedia.org/wiki/Help:How_to_start_a_new_Wikipedia. 5Attempts to find these instructions in other languages (German, Dutch, French, Spanish) yielded no results. 6Available at: https://en.wikipedia.org/wiki/Help:Editing 7The Low German translation begins with the remark: “’Keen mehr weten will to all de Funkscho- nen vun de Software schall beter in dat hoochdüütsche un dat engelsche Handbook kieken.” (=”If you want to know more about the functions of this software it would be better to look up the High German or English Handbook.”)

78 The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers

order to learn how to execute correct edits the newcomers will have to be familiar with either English or any of the other dominant languages. With regards to the five MERnW-languages researched in this paper, this means that the crews of the European Wikipedia editions had the translations from the Standard German, Standard Italian, and Standard English editions at their disposal for help. The crews of the African Wikipedia editions, however, had most-likely no such help. However, even with linguistic aid available, translating these instructions into languages, which for the most part do not have the necessary technical terms in their lexicon, can be a daunting or even impossible task. It raises the question which languages Wikipedia actually had in mind when they created this webpage, which is essential for starting a new Wikipedia edition. In fact, to assume that terms such as parenthetical editing, toolbar, and section-edit links are easily translatable into minority, endangered, regional, and non-Western languages shows either ignorance of the situation of these languages or a disregard for them.

4.2 The West is the Best

Let us now suppose that our newcomers have successfully cleared all translation hurdles and are ready to “go live”. Which articles will they include in their new edition? Wikimedia, the parent organization of Wikipedia headquartered in San Francisco, California, provides a list of items for new Wikipedia editions titled “List of articles every Wikipedia should have”.8 This list consists of 1,000 terms and includes multiple categories, such as Arts and Recreation, Physics, Technology, Electronics, Foodstuffs, Organisms, etc. At first glance, the list leaves the (Western) reader with the impression that these are all items which a decent encyclopedia should have. A closer look, however, reveals that this list is almost entirely based on a Western education, Western values, Western entertainment, and a Western point of view in general. Some examples: the category “Actors, Dancers, and Models” lists four personalities: Sarah Bernhardt, Charlie Chaplin, Marlene Dietrich, and Marilyn Monroe. The list includes a category of twenty-one “Composers and Musicians” that a new Wikipedia edition should mention, all of them from Europe or North America. Of the thirty-one “Inventors, Scientists, and Mathematicians” one is from (Brahmagupta) and two are from Persia (Avicenna and ibn Musa al- Khwarizmi). The remaining twenty-eight are either from Europe or North America. In addition, the combined composer/musician and inventor/scientist/mathematician categories have a total of one female (Marie Curie). Among the ten “Specific Music Genres” only two (Samba and Reggae) are not specifically related to North America

8Available at: https://meta.wikimedia.org/wiki/List_of_articles_every_Wikipedia_should_have

79 The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers or Europe (the other eight are: Blues, Classical Music, Opera, Electronic Music, Flamenco, Hip Hop, Jazz, and Rock Music). In fact, the list is not only Western and male-oriented but heavily favors the Global North over the Global South. It should be mentioned here that Wikimedia regards this list as a suggestion and does not enforce its implementation. For example, of the five MERnW-languages researched in this paper only Irish and Piedmontese have short entries for “Sarah Bernhardt”. Nonetheless, these suggestions demonstrate the Western bias of Wikipedia and its parent organization. A concrete and much publicized example of this bias is the entry “Acupuncture” from the English Wikipedia edition. The article begins with:

“[Acupuncture] is a key component of Traditional Chinese Medicine (TCM). TCM theory and practice are not based on scientific knowl- edge, and acupuncture is a pseudoscience.”9

In 2017, the Director of the British Medical Acupuncture Society Dr. Cummings accused Wikipedia in a blog of being biased against holistic healthcare and asked the editors to include accredited sources that prove the scientific viability of acupunc- ture.10 They refused, and, allegedly, banned users from Wikipedia who tried to change the page. This paper investigated how other dominant, Western Wikipedia editions define acupuncture and found that the French and Spanish editions also call this ancient Chinese practice “pseudoscience”. The Dutch edition speaks more cautiously of “placebo effects”, and only the German edition accepts acupuncture as an alternative medicinal practice and acknowledges that it can work.11 Wikipedia’s partialness toward Western perspectives and the Global North does not seem to be a coincidence, though, nor is it a particularly new phenomenon. In 2011, Wikimedia conducted an internal “Editor Survey” and, rather courageously, published the results online.12 According to their own findings, the typical Wikipedia editor, or “Wikipedian”, is “male, has a college degree, is 30 years old, is computer savvy, [...] and lives in the U.S. or Europe.” The percentage of female editors was 8.5% worldwide at the time of the survey. Wikipedia’s readership, however, is almost evenly split 50-50 in terms of gender. Interestingly, the authors, who published the

9Available at: https://en.wikipedia.org/wiki/Acupuncture 10Available at: http://blogs.bmj.com/aim/2016/12/30/is-acupuncture-pseudoscience/ 11Several Chinese sources report that the English edition entry for acupuncture caused “furor and outrage in ”. For more information see The Beijinger at: https://www.thebeijinger.com/blog/2017/02/06/furor-in-china-after-wikipedia-calls-acupuncture- pseudoscience 12Available at: https://meta.wikimedia.org/wiki/Editor_Survey_2011/Executive_Summary

80 The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers

results, vehemently denied that Wikipedia’s internal structure drives away female editors:

“Contrary to the perception of some, our data shows that very few women editors feel like they have been harassed, and very few feel that Wikipedia is a sexualized environment.”

To prove whether this is the case or not would go beyond the purpose of this paper. Suffice it to say, though, that one can find multiple reports and first-person accounts on the Internet of present and former female editors that speak to the opposite.13

4.3 Geographic Patterns of Editor Participation

It was not really clear what the geographical and demographical imbalance of editors mean for the representation of MERnW-languages on Wikipedia until in 2015 Graham, Straumann, and Hogan from the University of Oxford published their results of a three-year study that looked at geographic patters of participation on Wikipedia. Lead author Graham and his colleagues examined four million Wikipedia articles in forty-four languages and focused particularly on those entries that mention places. By geocoding these articles, they were able to look at the location of a particular editor and the place that he (or sometimes she) wrote about.14 The authors point out that Wikipedia’s “immense popularity and influence prompt[ed] us to ask who it is that has a voice and is able to exert power on the platform.” (2015: 1160) To begin with, Graham et al. found that the number of Wikipedia editors living in a handful of first-world, Western countries had even increased since Wikipedia’s own survey from 2011. Their data revealed that nearly half of all edits were made by contributors living in just five countries: the U.S, the UK, Germany, , and Italy. Other striking results were that the , a country of 17.2 million people, has more editors than all of Africa combined, and that North America has 100 times the editing power than Sub-Saharan Africa. The geocoding technique allowed the authors to “examine the ratio of au- tochthonous (locally sourced) and allochthonous (non-locally sourced) contributions to articles within every country of the globe.” (2015: 1161) Their results show the uneven levels of voice and participation on Wikipedia:

13For example the report: “Wikipedia’s Hostility to Women”, which appeared in in October 2015. Available at: https://www.theatlantic.com/technology/archive/2015/10/how-wikipedia- is-hostile-to-women/411619/ 14Such information can be obtained from Wikimedia. For more information, see Graham et al. (2015: 1161-1162).

81 The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers

“First, we find that many articles about places are edited by nonlocals, thus challenging the idea that Wikipedia offers a platform for a local voice. Second, much of the small amount of participation that we see originating in low-income countries is actually focused on writing about global cores.” (2015: 1161)

Graham and his colleagues call this phenomenon “informational magnetism”, i.e. Wikipedia editors from the world’s economic peripheries are drawn to its economic cores. To give an example from the context of this study: a U.S. Wikipedia editor is much more likely to write about a place in Nigeria than a local Nigerian editor is; the Nigerian editor is more likely to write about a place in North America or Europe in his/her local Yoruba edition. The authors acknowledge that this phenomenon seems perplexing and counter-productive:

“One would probably expect that people most likely start editing articles that are close to them in both topic and geographic space. [...] We concede, however, that various confounding variables could also be at play: Maybe people who have registered with Wikipedia belong to a specific group that is more international or travels more (than the average editor) and thus feel confident to write about places in other world regions.” (2015: 1172)

The Oxford researchers, however, also point out that broadband access still is a core factor in terms of participation, although “not a linear one” (2015, 1175), i.e. countries with high levels of connectivity have “disproportionally” higher rates of participation “compared to countries with worse access.” (2015: 1175) Whatever the reasons are, the authors’ conclusion are quite worrying:

“Countries that are home to large blocks of editors have the ability to dominate the production of knowledge about smaller countries. [...] Not only are most of the world’s economic peripheries under participating, but they are likely to be defined by others. The sheer volume of edits coming from Europe and North America sees to this.” (2015: 1175)

From a sociological point of view, it is rather alarming that “eminent hubs of knowledge production [...] that underpin our everyday lives” (2015, 1160) are not exactly shining examples of the global democratization that is touted by the Internet’s biggest platforms (and Internet providers). What does all this mean in practice for the MERnW-languages represented on Wikipedia?

82 The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers

5 Conclusions

Informational magnetism may account for the very low or non-existing participation for region-specific terms in this study from the two African Wikipedia editions (Yoruba and Zulu). The results for region-specific terms from the three European editions (Irish, Piedmontese, and Low German) suggest that informational mag- netism is not limited to the world’s economic peripheries but seems to extend to the economic cores as well. In other words, Wikipedia editors belonging to and writing in minority, regional, and endangered languages from within Europe tend to write about non-local places instead of local ones. Information magnetism, however, does not account for the generally low results of the five MERnW-languages in the three main categories of this study (Common Items, World Heritage Sites, Common Geological and Astrological Features). In- stead, one will have to consider another internal feature of Wikipedia. In the early days of the platform, Wikipedia was often plagued by criticisms of being inaccurate and unreliable. In order to improve its credibility and reliability, Wikimedia intro- duced a set of “acceptable sources”. In particular, the parent organization insists, quite reasonably, that sources and citations in Wikipedia articles must have been published and they must be available.15 The problem is that a lot of published and established information is composed in one of the world’s dominant languages. A study by Nielsen(2007) showed that academic papers, journals, and books in English are especially often cited as sources in Wikipedia. For instance, Nielsen found that American and British professional journals, such as Nature, Science, New England Journal of Medicine, The Lancet, and British Medical Journal are generally among the most-cited sources for scientific entries in Wikipedia editions. In fact, among the established editions it is not unheard of that senior editors/proofreaders might require sources written in English or other dominant languages to be included or cited in a new Wikipedia entry, regardless of the entry’s language. Although this practice does not affect smaller editions nearly to the same degree, it shows Wikipedia’s privileging of established mainstream sources written in mainstream languages. In sum, although Wikipedia emphasizes the “local voices” of smaller editions, this kind of preference for and reliance on mainstream sources effectively leaves many editions of MERnW-languages outside the mainstream. Finally, the social and economic status of typical editors (or “Power-Wikipedians”) might also play a not inconsiderable role. It not only requires a relatively high level of education, more than average knowledge of information technology, and, as we have seen, being familiar with at least one of the world’s dominate languages. Since

15Available at: https://en.wikipedia.org/wiki/Wikipedia:Acceptable_sources

83 The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers

editors are usually unpaid, it also presupposes a certain amount of disposable income and leisure time, and generally of “being in the know and up to date”. These are all attributes that one cannot take for granted with regards to editors of smaller languages, particularly those of non-Western languages on the world’s economic peripheries.

6 Final Thoughts

The main focus of this paper was to establish whether a representation on Wikipedia can help MERnW-languages, or at least the five MERnW-languages examined in this study, to establish a meaningful online presence and to secure digital ascent, as Kor- nai asserts. The latter point cannot be answered here as only time will tell. Another study involving the five MERnW-languages will have to be conducted in, say, five or ten years from now to find out. As to the former point, achieving “Wikipedia Status” is a significant accomplishment for any smaller language. However, the internal and external obstacles that MERnW-languages are dealing with can result in online portrayals that misconstrue their true linguistic reality and cultural identity. There is not really one single factor that could be considered the main obstacle for achieving a meaningful presence on Wikipedia, but rather it is the sum of all factors discussed in this paper that make it so very difficult. This paper does not suggest that Wikipedia intentionally prevents smaller lan- guages from starting their own editions. In fact, given Wikipedia’s philosophy, the opposite might rather be the case. This paper does suggest, however, that Wikipedia could and should do more to make its internal structure and overall culture fairer and more inclusive toward smaller languages, something that Wikipedia founder and other top Wikimedia officials have vowed to do time and time again. In fact, Wikipedia and other popular platforms are not the problem for the absence and underrepresentation of MERnW-languages on the Internet but only part of a bigger problem, which is the fabric of the Internet itself. It has not become the great equalizer as was promised in the 1990s by activists and enthusiasts. The Internet’s tremendous potential for democratization has been realized to a certain degree since it was and is instrumental in achieving more equality for many subcultures and for previously marginalized groups. Moreover, the Internet, and particularly social media, have become a powerful tool for the politically and religiously oppressed, and for those who otherwise have no voice. In this respect, current buzzwords such as “connectivity” and “empowerment” really are more than just phrases. The problem is that these terms do not extend to most MERnW-languages. Internet access and affordability remain real issues but can usually be fixed if governments and tech

84 The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers companies are willing to invest. However, with a few exceptions, the Internet has given no voice to the many speakers of MERnW-languages around the world. In its current state, the Internet favors ca. 75-90 languages spoken by approximately one half of the world’s population, while it subdues and mutes the voices of the roughly 6,000 remaining languages spoken by the other half.

85 The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers

References

Crystal, David. 2002. Language Death. Cambridge: Cambridge University Press. Cunliffe, Daniel. 2007. Minority languages and the internet: new threats, new opportunities. In Mike Cormack & Niamh Hourigan (eds.), Minority language media: Concepts, critiques, and case studies, 133–150. Bristol: Multilingual Matters. Dalby, Andrew. 2009. The World and Wikipedia: How we are editing reality. Siduri Books. Dolowy-Rybinska, Nicole. 2013. Kashubian and Modern Media: The Influence of New Technologies on Endangered Languages. In Jones Elin Haf Gruffyyd & Enrique Uribe-Jongbloed (eds.), Social media and minority languages: Conver- gence and the creative industries, 119–130. Bristol: Multilingual Matters. Dudenhöffer, Kathrin & Michael Meyen. 2012. Digitale Spaltung im Zeitalter der Sättigung: Eine Sekundäranalyse der ACTA 2008 zum Zusammenhang von Internetnutzung und sozialer Ungleichheit. Graham, Mark, Ralph K. Straumann & Bernie Hogan. 2015. Digital Divisions of Labor and Informational Magnetism: Mapping Participation in Wikipedia. Annals of the Association of American Geographers 105(6). 1158–1178. Kornai, András. 2013. Digital Language Death. PLOS ONE 8(10). 1–11. Nielsen, Finn Årup. 2007. Scientific Citations in Wikipedia. First Monday 12(8). http://firstmonday.org/ojs/index.php/fm/article/view/1997/1872. Otterbacher, Jahna. 2014. Our News, Their Events? A Comparison of Archived Current Events on English and Green Wikipedias. In Pnina Fichmann & Noriko Hara (eds.), Global wikipedia. international and cross-cultural issues in online collaborations, 49–69. Rowman and Littlefield. Wright, Sue. 2006. Regional or Minority Languages of the WWW. Journal of Language Politics 5(2). 189–216.

86