The Struggle of Small and Non-Western Wikipedia Editions

Total Page:16

File Type:pdf, Size:1020Kb

The Struggle of Small and Non-Western Wikipedia Editions The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers Wake Forest University Abstract The online encyclopedia Wikipedia has become one of the most influ- ential Internet platforms on the World Wide Web and is currently the sixth-most visited website overall. For smaller languages, creating their own Wikipedia edi- tions can constitute a tremendous boost to their general online presence. This paper investigates whether Wikipedia’s internal structure and culture is really inclusive in its treatment and representation of minority, endangered, regional, and non-Western languages. The paper argues that Wikipedia and, indeed, the Internet itself favor Western, mainstream languages and content and thus make it almost impossible for smaller languages to achieve a meaningful online presence. 1 Introduction - Digital Divide The term "digital divide" dates back to the early days of the Internet in the 1990s and describes the unequal access of different sections of the population to new information and communication technologies (ICT) in international, national, and regional comparisons. This term does not only refer to the acquisition and ownership of new technological devices (e.g. personal computers, laptops, smartphones, etc.), but also to the fact that on the one hand more than half of all people in the world have no access to the Internet and, on the other hand, navigating the Internet (use or handling) poses a significant problem for many people who do have access. From a sociological point of view, researchers (Dudenhöffer & Meyen 2012) worry that information technologies will create a new two-tier society between those who can afford ICT equipment and who have the knowledge to operate these devices and those who do not have the necessary income to acquire such devices, or who are having difficulties handling such technologies. Furthermore, it is feared that existing inequalities, especially in terms of education, income and social skills, are being recreated or will even intensify in the new online world. Considering the rapid rise of the Internet as the largest communication system in human history, it becomes clear that people who cannot participate in this phenomenon are not only marginalized, but also have significantly fewer opportunities and chances than the so-called habitual users of these technologies. Critics point out that the digital divide cannot be substantiated empirically. In particular, they stress that Wiggers, H. 2018. The Struggle of Small and Non-Western Wikipedia Editions. Proceedings of the 4th Annual Linguistics Conference at UGA, The Linguistics Society at UGA: Athens, GA. 66–86. The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers problems of use are relatively easy to remedy, and that it is up to the users themselves to gain the necessary knowledge to navigate the virtual world. By international standards, the digital divide is also considered to be predominantly a sociological and demographic problem, with some blatant inequalities between developed and developing countries. For example, current data from 2017 show that only about 10% of all 1.25 billion people living in Africa are Internet users and/or have access to the Internet. In Madagascar, for instance, out of an estimated 25.6 million people, just 1.3 million have access to the Internet, while the number of Internet users in Ethiopia is approximately 16 million, with an estimated 104.5 million inhabitants. The situation is similar in many countries of Southeast Asia. 2 Smaller Languages and the World Wide Web In addition to sociological problems, there is much debate about whether and to what extent access to and use of the Internet poses a threat to minority, endangered, regional, and non-Western languages (henceforth referred to as MERnW-languages). Linguists (such as Crystal 2002) point to a massive extinction of languages and estimate that approximately half of the estimated 6000-7000 languages currently spoken in the world will be extinct by the end of the 21st century. This process already existed before the spread of the Internet, but it has noticeably accelerated since the turn of the millennium. In linguistic research there are relatively large differences of opinion as to how the digitization of large parts of humanity contributes to the extinction of languages. Many linguists and language activists see the Internet as a chance to revive MERnW-languages or make them more accessible to a wider audience. Many others, however, fear that the increasing interconnectedness of the world only benefits the major dominant languages, such as English, Spanish, German, or French, and that smaller languages inevitably will fall by the wayside. This is particularly true for MERnW-languages whose speakers often have problems with accessing or using the Internet. The figures below show that the digital revolution of recent decades is by no means a reflection of global linguistic diversity: i. Of the estimated 6000-7000 spoken languages in the world, less than 500 had a digital existence in 2017 (i.e. websites in their languages). ii. Of the approximately 3.9 billion Internet users worldwide in 2017, around 3 billion are speakers of the so-called "top ten languages online". These are: English, Chinese, Spanish, Arabic, Portuguese, Indonesian / Malay, Japanese, Russian, French and German. 67 The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers iii. This means that the remaining 900,000 million Internet users in 2017 were distributed among the approximately 470-480 remaining languages that are represented online. In 2007, Cunliffe launched an extensive study to investigate the online presence of smaller languages and came to the following conclusion: “The linguistic diversity of the world is poorly reflected on the Inter- net. [...] 90% of the world’s languages are simply not represented.” (2007: 139) The American media, however, seem to view the Internet as a rather positive medium for smaller languages. In fact, in recent years a slew of American media reports appeared, whose headlines alone seem to indicate that internet technology and/or globalization are a cure-all for MERnW-languages, such as: “Globalization helps prevent endangered languages” (Yale Global News, December 2013); “For rare languages, social media provide new hope” (NPR, July 2014); and “Technology to the endangered language rescue!” (Huffington Post, January 2015). This kind of trust in the Internet as a regenerative medium relies on a considerable body of research that views the World Wide Web and especially social media as a significant opportunity for MERnW- languages. In addition, these are not limited to African or threatened languages in South America’s Amazon region, but also extend to Europe’s endangered languages. Dolowy-Rybinska(2013), for example, investigated the use of Kashubian in social media, and came to the conclusion that this language, whose use was prohibited under Poland’s communist rule, benefits enormously from the Internet: “Speaking most broadly, the rise of the Internet has been very advan- tageous for the Kashubian-speaking community, especially for the young. [...] It [using Kashubian online] has led to an increase in the prestige of the language: if Kashubian can be used online, it cannot be so inferior and unsuitable after all. [...] Young people commu- nicate, exchange remarks Kashubian culture and its function in the modern world, and find other people to whom Kashubian language and culture are also important.” (2013: 127-128) Susan Wright (2006) examined the use of five other smaller, regional European lan- guages online (Occitan, Piedmontese, Ladin, Sardinian, and Frisian) and concluded likewise that the Internet was generally a positive development for these languages: 68 The Struggle of Small and Non-Western Wikipedia Editions Heiko Wiggers “[...] all five languages in the survey are present on the Internet. With- out providing actual figures, which are [...] likely to be misleading and immediately out of date, we can nonetheless report that the Occi- tan researchers found over a thousand sites, the Sardinian and Frisian researchers hundreds, and the Piedmontese and Ladin researchers dozens. The numbers of websites in which the five languages are used is, therefore, not negligible, and their presence in this medium indisputable.” (2006: 192-193). Despite these generally auspicious results, both authors point out that their research was ultimately inconclusive, as it is impossible to predict whether the Internet will really improve the situation of these languages in the long term. In addition, both researchers emphasize that the digital presence of a MERnW- language is by no means equivalent to a language revival: “The fact remains that using certain pages in the minority language is unlikely to produce a major linguistic shift among young people; their main language is likely to remain the national language or – in international contacts – English.” (2013: 127). In general, the influence of the Internet on MERnW-languages outside the U.S., is seen in a much more cautious manner. The most widely respected and most recognized study in the non-English-speaking world on this subject comes from the Hungarian linguist András Kornai and his team from the Budapest Institute of Technology. With meticulous research and the application of mathematical formulas and algorithms, Kornai’s team not only explored the current digital state of MERnW- languages, but also made predictions about their digital future, which are quite sobering. Based on his team’s calculations Kornai predicts that less than three hundred languages will have an online presence by the 21st century: “With only 250 digital survivors, all others must inevitably drift towards digital heritage status or digital extinction. [...] There could be another 20 spoken languages [...] that may make it, but every one of these will be an uphill battle. For 95% of the world’s languages there is very little hope of crossing the digital divide.” (2013: 10) Furthermore, Kornai points out that it is very difficult for MERnW-languages to secure a so-called digital ascent, i.e.
Recommended publications
  • A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages
    Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 2373–2380 Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages Dwaipayan Roy, Sumit Bhatia, Prateek Jain GESIS - Cologne, IBM Research - Delhi, IIIT - Delhi [email protected], [email protected], [email protected] Abstract Wikipedia is the largest web-based open encyclopedia covering more than three hundred languages. However, different language editions of Wikipedia differ significantly in terms of their information coverage. We present a systematic comparison of information coverage in English Wikipedia (most exhaustive) and Wikipedias in eight other widely spoken languages (Arabic, German, Hindi, Korean, Portuguese, Russian, Spanish and Turkish). We analyze the content present in the respective Wikipedias in terms of the coverage of topics as well as the depth of coverage of topics included in these Wikipedias. Our analysis quantifies and provides useful insights about the information gap that exists between different language editions of Wikipedia and offers a roadmap for the Information Retrieval (IR) community to bridge this gap. Keywords: Wikipedia, Knowledge base, Information gap 1. Introduction other with respect to the coverage of topics as well as Wikipedia is the largest web-based encyclopedia covering the amount of information about overlapping topics.
    [Show full text]
  • Modeling Popularity and Reliability of Sources in Multilingual Wikipedia
    information Article Modeling Popularity and Reliability of Sources in Multilingual Wikipedia Włodzimierz Lewoniewski * , Krzysztof W˛ecel and Witold Abramowicz Department of Information Systems, Pozna´nUniversity of Economics and Business, 61-875 Pozna´n,Poland; [email protected] (K.W.); [email protected] (W.A.) * Correspondence: [email protected] Received: 31 March 2020; Accepted: 7 May 2020; Published: 13 May 2020 Abstract: One of the most important factors impacting quality of content in Wikipedia is presence of reliable sources. By following references, readers can verify facts or find more details about described topic. A Wikipedia article can be edited independently in any of over 300 languages, even by anonymous users, therefore information about the same topic may be inconsistent. This also applies to use of references in different language versions of a particular article, so the same statement can have different sources. In this paper we analyzed over 40 million articles from the 55 most developed language versions of Wikipedia to extract information about over 200 million references and find the most popular and reliable sources. We presented 10 models for the assessment of the popularity and reliability of the sources based on analysis of meta information about the references in Wikipedia articles, page views and authors of the articles. Using DBpedia and Wikidata we automatically identified the alignment of the sources to a specific domain. Additionally, we analyzed the changes of popularity and reliability in time and identified growth leaders in each of the considered months. The results can be used for quality improvements of the content in different languages versions of Wikipedia.
    [Show full text]
  • Omnipedia: Bridging the Wikipedia Language
    Omnipedia: Bridging the Wikipedia Language Gap Patti Bao*†, Brent Hecht†, Samuel Carton†, Mahmood Quaderi†, Michael Horn†§, Darren Gergle*† *Communication Studies, †Electrical Engineering & Computer Science, §Learning Sciences Northwestern University {patti,brent,sam.carton,quaderi}@u.northwestern.edu, {michael-horn,dgergle}@northwestern.edu ABSTRACT language edition contains its own cultural viewpoints on a We present Omnipedia, a system that allows Wikipedia large number of topics [7, 14, 15, 27]. On the other hand, readers to gain insight from up to 25 language editions of the language barrier serves to silo knowledge [2, 4, 33], Wikipedia simultaneously. Omnipedia highlights the slowing the transfer of less culturally imbued information similarities and differences that exist among Wikipedia between language editions and preventing Wikipedia’s 422 language editions, and makes salient information that is million monthly visitors [12] from accessing most of the unique to each language as well as that which is shared information on the site. more widely. We detail solutions to numerous front-end and algorithmic challenges inherent to providing users with In this paper, we present Omnipedia, a system that attempts a multilingual Wikipedia experience. These include to remedy this situation at a large scale. It reduces the silo visualizing content in a language-neutral way and aligning effect by providing users with structured access in their data in the face of diverse information organization native language to over 7.5 million concepts from up to 25 strategies. We present a study of Omnipedia that language editions of Wikipedia. At the same time, it characterizes how people interact with information using a highlights similarities and differences between each of the multilingual lens.
    [Show full text]
  • The Culture of Wikipedia
    Good Faith Collaboration: The Culture of Wikipedia Good Faith Collaboration The Culture of Wikipedia Joseph Michael Reagle Jr. Foreword by Lawrence Lessig The MIT Press, Cambridge, MA. Web edition, Copyright © 2011 by Joseph Michael Reagle Jr. CC-NC-SA 3.0 Purchase at Amazon.com | Barnes and Noble | IndieBound | MIT Press Wikipedia's style of collaborative production has been lauded, lambasted, and satirized. Despite unease over its implications for the character (and quality) of knowledge, Wikipedia has brought us closer than ever to a realization of the centuries-old Author Bio & Research Blog pursuit of a universal encyclopedia. Good Faith Collaboration: The Culture of Wikipedia is a rich ethnographic portrayal of Wikipedia's historical roots, collaborative culture, and much debated legacy. Foreword Preface to the Web Edition Praise for Good Faith Collaboration Preface Extended Table of Contents "Reagle offers a compelling case that Wikipedia's most fascinating and unprecedented aspect isn't the encyclopedia itself — rather, it's the collaborative culture that underpins it: brawling, self-reflexive, funny, serious, and full-tilt committed to the 1. Nazis and Norms project, even if it means setting aside personal differences. Reagle's position as a scholar and a member of the community 2. The Pursuit of the Universal makes him uniquely situated to describe this culture." —Cory Doctorow , Boing Boing Encyclopedia "Reagle provides ample data regarding the everyday practices and cultural norms of the community which collaborates to 3. Good Faith Collaboration produce Wikipedia. His rich research and nuanced appreciation of the complexities of cultural digital media research are 4. The Puzzle of Openness well presented.
    [Show full text]
  • How to Contribute Climate Change Information to Wikipedia : a Guide
    HOW TO CONTRIBUTE CLIMATE CHANGE INFORMATION TO WIKIPEDIA Emma Baker, Lisa McNamara, Beth Mackay, Katharine Vincent; ; © 2021, CDKN This work is licensed under the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/legalcode), which permits unrestricted use, distribution, and reproduction, provided the original work is properly credited. Cette œuvre est mise à disposition selon les termes de la licence Creative Commons Attribution (https://creativecommons.org/licenses/by/4.0/legalcode), qui permet l’utilisation, la distribution et la reproduction sans restriction, pourvu que le mérite de la création originale soit adéquatement reconnu. IDRC Grant/ Subvention du CRDI: 108754-001-CDKN knowledge accelerator for climate compatible development How to contribute climate change information to Wikipedia A guide for researchers, practitioners and communicators Contents About this guide .................................................................................................................................................... 5 1 Why Wikipedia is an important tool to communicate climate change information .................................................................................................................................. 7 1.1 Enhancing the quality of online climate change information ............................................. 8 1.2 Sharing your work more widely ......................................................................................................8 1.3 Why researchers should
    [Show full text]
  • Texts from De.Wikipedia.Org, De.Wikisource.Org
    Texts from http://www.gutenberg.org/files/13635/13635-h/13635-h.htm, de.wikipedia.org, de.wikisource.org BRITISH AND GERMAN AUTHORS AND Contents: INTELLECTUALS CONFRONT EACH OTHER IN 1914 1. British authors defend England’s war (17 September) The material here collected consists of altercations between British and German men of letters and professors in the first 2. Appeal of the 93 German professors “to the world of months of the First World War. Remarkably they present culture” (4 October 1914) themselves as collective bodies and as an authoritative voice on — 2a. German version behalf of their nation. — 2b. English version The English-language materials given here were published in, 3. Declaration of the German university professors (16 and have been quoted from, The New York Times Current October 1914) History: A Monthly Magazine ("The European War", vol. 1: — 3a. German version "From the beginning to March 1915"; New York: The New — 3b. English version York Times Company 1915), online on Project Gutenberg (www.gutenberg.org). That same source also contains many 4. Reply to the German professors, by British scholars interventions written à titre personnel by individuals, including (21 October 1914) G.B. Shaw, H.G. Wells, Arnold Bennett, John Galsworthy, Jerome K. Jerome, Rudyard Kipling, G.K. Chesterton, H. Rider Haggard, Robert Bridges, Arthur Conan Doyle, Maurice Maeterlinck, Henri Bergson, Romain Rolland, Gerhart Hauptmann and Adolf von Harnack (whose open letter "to Americans in Germany" provoked a response signed by 11 British theologians). The German texts given here here can be found, with backgrounds, further references and more precise datings, in the German wikipedia article "Manifest der 93" and the German wikisource article “Erklärung der Hochschullehrer des Deutschen Reiches” (in a version dated 23 October 1914, with French parallel translation, along with the names of all 3000 signatories).
    [Show full text]
  • Semantically Annotated Snapshot of the English Wikipedia
    Semantically Annotated Snapshot of the English Wikipedia Jordi Atserias, Hugo Zaragoza, Massimiliano Ciaramita, Giuseppe Attardi Yahoo! Research Barcelona, U. Pisa, on sabbatical at Yahoo! Research C/Ocata 1 Barcelona 08003 Spain {jordi, hugoz, massi}@yahoo-inc.com, [email protected] Abstract This paper describes SW1, the first version of a semantically annotated snapshot of the English Wikipedia. In recent years Wikipedia has become a valuable resource for both the Natural Language Processing (NLP) community and the Information Retrieval (IR) community. Although NLP technology for processing Wikipedia already exists, not all researchers and developers have the computational resources to process such a volume of information. Moreover, the use of different versions of Wikipedia processed differently might make it difficult to compare results. The aim of this work is to provide easy access to syntactic and semantic annotations for researchers of both NLP and IR communities by building a reference corpus to homogenize experiments and make results comparable. These resources, a semantically annotated corpus and a “entity containment” derived graph, are licensed under the GNU Free Documentation License and available from http://www.yr-bcn.es/semanticWikipedia. 1. Introduction 2. Processing Wikipedia1, the largest electronic encyclopedia, has be- Starting from the XML Wikipedia source we carried out a come a widely used resource for different Natural Lan- number of data processing steps: guage Processing tasks, e.g. Word Sense Disambiguation (Mihalcea, 2007), Semantic Relatedness (Gabrilovich and • Basic preprocessing: Stripping the text from the Markovitch, 2007) or in the Multilingual Question Answer- XML tags and dividing the obtained text into sen- ing task at Cross-Language Evaluation Forum (CLEF)2.
    [Show full text]
  • Does Wikipedia Matter? the Effect of Wikipedia on Tourist Choices Marit Hinnosaar, Toomas Hinnosaar, Michael Kummer, and Olga Slivko Discus­­ Si­­ On­­ Paper No
    Dis cus si on Paper No. 15-089 Does Wikipedia Matter? The Effect of Wikipedia on Tourist Choices Marit Hinnosaar, Toomas Hinnosaar, Michael Kummer, and Olga Slivko Dis cus si on Paper No. 15-089 Does Wikipedia Matter? The Effect of Wikipedia on Tourist Choices Marit Hinnosaar, Toomas Hinnosaar, Michael Kummer, and Olga Slivko First version: December 2015 This version: September 2017 Download this ZEW Discussion Paper from our ftp server: http://ftp.zew.de/pub/zew-docs/dp/dp15089.pdf Die Dis cus si on Pape rs die nen einer mög lichst schnel len Ver brei tung von neue ren For schungs arbei ten des ZEW. Die Bei trä ge lie gen in allei ni ger Ver ant wor tung der Auto ren und stel len nicht not wen di ger wei se die Mei nung des ZEW dar. Dis cus si on Papers are inten ded to make results of ZEW research prompt ly avai la ble to other eco no mists in order to encou ra ge dis cus si on and sug gesti ons for revi si ons. The aut hors are sole ly respon si ble for the con tents which do not neces sa ri ly repre sent the opi ni on of the ZEW. Does Wikipedia Matter? The Effect of Wikipedia on Tourist Choices ∗ Marit Hinnosaar† Toomas Hinnosaar‡ Michael Kummer§ Olga Slivko¶ First version: December 2015 This version: September 2017 September 25, 2017 Abstract We document a causal influence of online user-generated information on real- world economic outcomes. In particular, we conduct a randomized field experiment to test whether additional information on Wikipedia about cities affects tourists’ choices of overnight visits.
    [Show full text]
  • Lessons from Citizendium
    Lessons from Citizendium Wikimania 2009, Buenos Aires, 28 August 2009 HaeB [[de:Benutzer:HaeB]], [[en:User:HaeB]] Please don't take photos during this talk. Citizendium Timeline ● September 2006: Citizendium announced. Sole founder: Larry Sanger, known as former editor-in-chief of Nupedia, chief organizer of Wikipedia (2001-2002), and later as Wikipedia critic ● October 2006: Started non-public pilot phase ● January 2007: “Big Unfork”: All unmodified copies of Wikipedia articles deleted ● March 2007: Public launch ● December 2007: Decision to use CC-BY-3.0, after debate about commercial reuse and compatibility with Wikipedia ● Mid-2009: Sanger largely inactive on Citizendium, focuses on WatchKnow ● August 2009: Larry Sanger announces he will step down as editor-in-chief soon (as committed to in 2006) Citizendium and Wikipedia: Similarities and differences ● Encyclopedia ● Strict real names ● Free license policy ● ● Open (anyone can Special role for contribute) experts: “editors” can issue content ● Created by amateurs decisions, binding to ● MediaWiki-based non-editors collaboration ● Governance: Social ● Non-profit contract, elements of a constitutional republic Wikipedian views of Citizendium ● Competitor for readers, contributions ● Ally, common goal of creating free encyclopedic content ● “Who?” ● In this talk: A long-time experiment testing several fundamental policy changes, in a framework which is still similar enough to that of Wikipedia to generate valuable evidence as to what their effect might be on WP Active editors: Waiting to explode ● Sanger (October 2007): ”At some point, possibly very soon, the Citizendium will grow explosively-- say, quadruple the number of its active contributors, or even grow by an order of magnitude ....“ © Aleksander Stos, CC-BY 3.0 Number of users that made at least one edit in each month Article creation rate: Still muddling Sanger (October 2007): “It's still possible that the project will, from here until eternity, muddle on creating 14 articles per day.
    [Show full text]
  • Wikipedia Matters∗
    Wikipedia Matters∗ Marit Hinnosaar† Toomas Hinnosaar‡ Michael Kummer§ Olga Slivko¶ September 29, 2017 Abstract We document a causal impact of online user-generated information on real-world economic outcomes. In particular, we conduct a randomized field experiment to test whether additional content on Wikipedia pages about cities affects tourists’ choices of overnight visits. Our treatment of adding information to Wikipedia increases overnight visits by 9% during the tourist season. The impact comes mostly from improving the shorter and incomplete pages on Wikipedia. These findings highlight the value of content in digital public goods for informing individual choices. JEL: C93, H41, L17, L82, L83, L86 Keywords: field experiment, user-generated content, Wikipedia, tourism industry 1 Introduction Asymmetric information can hinder efficient economic activity. In recent decades, the Internet and new media have enabled greater access to information than ever before. However, the digital divide, language barriers, Internet censorship, and technological con- straints still create inequalities in the amount of accessible information. How much does it matter for economic outcomes? In this paper, we analyze the causal impact of online information on real-world eco- nomic outcomes. In particular, we measure the impact of information on one of the primary economic decisions—consumption. As the source of information, we focus on Wikipedia. It is one of the most important online sources of reference. It is the fifth most ∗We are grateful to Irene Bertschek, Avi Goldfarb, Shane Greenstein, Tobias Kretschmer, Thomas Niebel, Marianne Saam, Greg Veramendi, Joel Waldfogel, and Michael Zhang as well as seminar audiences at the Economics of Network Industries conference in Paris, ZEW Conference on the Economics of ICT, and Advances with Field Experiments 2017 Conference at the University of Chicago for valuable comments.
    [Show full text]
  • The Sum of Human Knowledge? Not in One Wikipedia Language Edition
    Wikipedia @ 20 The Sum of Human Knowledge? Not in One Wikipedia Language Edition Marc Miquel-Ribé Published on: May 15, 2019 Updated on: Nov 26, 2019 Wikipedia @ 20 The Sum of Human Knowledge? Not in One Wikipedia Language Edition Image credit: Denis Schroeder (WMDE), Wikidata Items Map 2014—2017. “The sum of human wisdom is not contained in any one language, and no single language is capable of expressing all forms and degrees of human comprehension.” Ezra Pound Though I had used Wikipedia for years, it was only ten years ago when I discovered how each language edition community can freely organize its content—as there is no central editorial board. The Catalan version of the encyclopedia, in my native tongue, can have pages dedicated to its culture without impediment. Some might take this for granted, but I cherished this principle because of my memories of my grandfather, who was forbidden to speak his language in public during the forty years of Franco’s dictatorship, and of my mother, who did not have not the chance to be educated in her mother tongue. I did not immediately become a contributor, but I wanted to learn more and, hopefully, one day give back. Today, I am doing so as a researcher with the Wikipedia Cultural Diversity Observatory (WCDO). Though the English Wikipedia has brought much attention to the larger Wikimedia project, that project’s future and potential growth lie in many smaller languages and cultures, which are often overlooked—and under threat, as many human languages are likely to disappear by the end of the century.
    [Show full text]
  • Proceedings of Rely on Different Character Sets Such As MATMT2008 Workshop: Mixing Approaches to CJK Or Arabic
    9th SaLTMiL Workshop on “Free/open-Source Language Resources for the Machine Translation of Less-Resourced Languages” LREC 2014, Reykjavík, Iceland, 27 May 2014 Workshop Programme 09:00 – 09:30 Welcoming address by Workshop co-chair Mikel L. Forcada 09:30 – 10:30 Oral papers Iñaki Alegria, Unai Cabezon, Unai Fernandez de Betoño, Gorka Labaka, Aingeru Mayor, Kepa Sarasola and Arkaitz Zubiaga Wikipedia and Machine Translation: killing two birds with one stone Gideon Kotzé and Friedel Wolff Experiments with syllable-based English-Zulu alignment 10:30 – 11:00 Coffee break 11:00 – 13:00 Oral papers Inari Listenmaa and Kaarel Kaljurand Computational Estonian Grammar in Grammatical Framework Matthew Marting and Kevin Unhammer FST Trimming: Ending Dictionary Redundancy in Apertium Hrvoje Peradin, Filip Petkovski and Francis Tyers Shallow-transfer rule-based machine translation for the Western group of South Slavic languages Alex Rudnick, Annette Rios Gonzales and Michael Gasser Enhancing a Rule-Based MT System with Cross-Lingual WSD 13:00 – 13:30 General discussion 13:30 Closing Editors Mikel L. Forcada Universitat d’Alacant, Spain Kepa Sarasola Euskal Herriko Unibertsitatea, Spain Francis M. Tyers UiT Norgga árktalaš universitehta, Norway Workshop Organizers/Organizing Committee Mikel L. Forcada Universitat d’Alacant, Spain Kepa Sarasola Euskal Herriko Unibertsitatea, Spain Francis M. Tyers UiT Norgga árktalaš universitehta, Norway Workshop Programme Committee Iñaki Alegria Euskal Herriko Unibertsitatea, Spain Lars Borin Göteborgs Universitet, Sweden Elaine Uí Dhonnchadha Trinity College Dublin, Ireland Mikel L. Forcada Universitat d’Alacant, Spain Michael Gasser Indiana University, USA Måns Huldén Helsingin Yliopisto, Finland Krister Lindén Helsingin Yliopisto, Finland Nikola Ljubešić Sveučilište u Zagrebu, Croatia Lluís Padró Universitat Politècnica de Catalunya, Spain Juan Antonio Pérez-Ortiz Universitat d’Alacant, Spain Felipe Sánchez-Martínez Universitat d’Alacant, Spain Kepa Sarasola, Euskal Herriko Unibertsitatea, Spain Kevin P.
    [Show full text]