Language Exclusion in Computational Linguistics and Natural Language Processing

Total Page:16

File Type:pdf, Size:1020Kb

Language Exclusion in Computational Linguistics and Natural Language Processing Hard Numbers: Language Exclusion in Computational Linguistics and Natural Language Processing Martin Benjamin Kamusi Project International Place de la Gare 12 C 1020 Renens, Switzerland [email protected] Abstract The intersection between computer science and human language occurs largely for English and a few dozen other languages with strong economic or political support. The supermajority of the world’s languages have extremely little digital presence, and little activity that can be forecast to change that status. However, such an assertion has remained impressionistic in the absence of data comparing the attention lavished on elite languages with that given to the rest of the world. This study seeks to give some numbers to the extent to which non-lucrative languages sit at the margins of language technology and computational research. Three datasets are explored that reveal current hiring and research activity at universities and corporations concerned with computational linguistics and natural language processing. The data supports the conclusion that most research activity and career opportunities focus on a few languages, while most languages have little or no current research and little possibility for the professional pursuit of their development. Keywords: under-resourced languages, computational linguistics, NLP, language technology 1.0. Introduction reveal almost nil coverage for the supermajority of languages. One could hunt for resources per This paper looks at technology for “under- language, and find a corpus here, a spell-checker resourced” languages by examining the amount of there, and a bunch of Wikipedia stub pages about career opportunities and research projects in the asteroids somewhere else1. In the end, though, a field. Two data sets were evaluated to provide hard spreadsheet with 7000 languages and all known numbers regarding the proportion of high level technologies would show a smattering of ticks for a research in areas related to computational long tail of languages, for example where a linguistics. The hypothesis for the research was that passionate developer created an Android app2 or high level pursuit of technological development for where a field linguist shared a dictionary on most of the world’s languages is not a widely Webonary,3 and a huge clustering of resources for a available career option. This hypothesis was fully small assortment of languages that could be guessed supported by the data. Without a significant number without looking at the data. Kornai’s impressive of people working in the field, resources for under- effort (2013) to quantify existing resources for resourced languages cannot be developed. The languages of the world found that 6,541 had no numbers in this study, which indicate where the field detectable live online presence. Following Scannell will be casting its gaze for years to come, give no (2013) and Gibson (2014), it is possible to find cause for optimism that the situation will improve. instances of usage of nearly 2000 languages within communication technologies such as Twitter, but People who work on excluded languages know from these are examples of technology as a vessel rather experience that most of the world’s languages than an avenue for development. Because data about remain outside of the technological sphere, but do the topic of research activity is not obviously not have numerical ways to demonstrate the extent available, we have been left to make impressionistic of the marginalization. In principle, one could look assertions about the paucity of work in the field. to the amount of software that is localized in each language, but that would involve getting access to This study examined three sources of data that thousands of programs, installing them, and provide numerical indications of the extent to which enumerating the available user interface languages – under-resourced languages are active within the an impossible task that one already knows would overall profession of language technology. The first 1 The Yoruba Wikipedia, https://yo.wikipedia.org, has Clicking the link labelled “Ojúewé àrìnàkò” from any more than 31,000 articles listed. However, most of those Yoruba Wikipedia page gives a random page from the contain bogus content, including thousands of pages like project, with a high probability of landing on an asteroid. https://yo.wikipedia.org/wiki/23006_Pazden that are bot- 2 https://mothertongues.org/ generated stubs containing the names of asteroids. 3 https://www.webonary.org/ dataset consists of all the jobs posted on Linguist Category B, and nearly 7000 units for Category C. List (LL) in 2017 that specify applied, Figure 2 was a speculation drawn about two years computational, or text/ corpus linguistics.4 The prior to the present study that posited the ratio of second dataset consists of all the papers and posters research invested in each category as a rough inverse presented at COLING 2016,5 the 26th International of the number of languages affected. This paper Conference on Computational Linguistics, examines the hypothesis implicit in Figure 2. The organized by the Association for Natural Language hard numbers in this study show that the scale shown Processing, in Osaka, Japan. The third consists of all for research activity between categories A and C is the papers and posters presented at ICLDC 2017, 5th about right. The representation underestimates the International Conference on Language level of activity for Category B languages, however Documentation & Conservation, in Honolulu, – mid-resourced languages, where Kilgarriff and Hawaii. None of these are perfect representations of Grefenstette’s 2003 observations about trends the state of the field, for reasons discussed below, toward increasing digital multilingualism hold true, but they give an overall up-to-the-moment should have a bigger box, with a gradient toward indication of the state of attention that under- “under-resourced” that is certainly reached around resourced languages receive among those active in 50. On the other hand, the data bears out that several the profession. languages that were cast as Category B – notably Chinese, Japanese, and Arabic – are benefiting from Spoiler alert: under-resourced languages receive significant professional attention, and could now be almost no attention in work related to computational on the borderline of Category A, which would linguistics or natural language processing (NLP). maintain the ratio between A and B closer to the Type Number/ Examples Characteristics A 4 languages (English, Massive investment, many existing digital resources, large monolingual and French, German, Spanish) aligned corpora, somewhat functional machine translation with other languages in the group, primary focus of language technology research and development.6 B About 25 languages (many Moderate to large investment and research, increasing digital resources, large official languages of the monolingual corpora with bilingual alignment to A languages (especially EU, Chinese, Japanese, English), rough machine translation to A languages (usually English), focus Russian, Arabic) of interest for EU and national funders C All the rest. Almost 7000 Zero to mediocre investment and research. Some languages like Swahili and languages, spoken by the major languages of India, with more than 100 million speakers, have active majority of the world's 7 research communities and rough machine translation, usually to English. A billion people. couple of thousand have some form of print dictionary, ranging from lists of a few hundred words to massive volumes with hundreds of pages. Most are 'embattled' – either close to extinction, or disfavored by policy or practice. Funding is usually sparse. Table 1: Language Categories initial depiction. While a discussion of the state of 2. How much less resourced are “less- Category B languages is outside of the scope of this resourced” languages? paper, as they enjoy a host of advantages that elevate them above any notion of “under-resourced”, it is Table 1 proposes a typology of languages, wherein worth noting that many are making strides that will languages in Category A are the ones that receive redound increasingly to their benefit in the years to high attention in NLP research, Category B come. languages receive moderate attention, and Category C languages receive little or no attention. Figure 1 shows those languages at exact scale, proportional to the total number of languages in each category: a square with 4 units for Category A, 25 units for 4 Records were laboriously reviewed by setting search 5 http://coling2016.anlp.jp/ criteria on the Linguist List jobs page, 6 The list of “Any language” treated by DeepL (English, https://linguistlist.org/jobs/search-job1.cfm. Linguist List German, French, Spanish, Italian, Dutch, and Polish) at keeps records dating back many more years, but http://www.deepl.com/translator could be a better procuring those in a practical format would require estimate, but discussion of levels of inclusiveness among imposing on their staff. Whether a single year’s data is better-resourced languages would be the subject of a completely representative is therefore an open question. different paper. Figure 1: Languages of each type as a proportion of world Figure 2: Languages of each type as a proportion of total (actual numbers) investment and research attention (hypothesized estimate) people. The inclusion of these languages may 2.1 Linguist List indicate a glimmer of recognition that
Recommended publications
  • Digital Dictionaries of South Asia Funded by the U.S Department of Education Under Title VI, Section 605, October 1999 Through September 2002
    Digital Dictionaries of South Asia Funded by the U.S Department of Education under Title VI, Section 605, October 1999 through September 2002 PROGRAM TITLE OF PROJECT International Research and Studies Program, Digital Dictionaries of South Asia International Education and Graduate Programs Service, U.S. Department of APPLICANT Education, CFDA No. 84.017A The University of Chicago 970 East 58th Street FUNDING REQUESTED Chicago, Illinois 60637 $453,071 for three years 773-702-8602 PROJECT DATES PROJECT DIRECTOR Oct. 1, 1999 - Sept. 30, 2002 James H. Nye Director, South Asia Language and Area Center 5848 University Avenue, Kelly 313 Project Director’s Signature Chicago, Illinois 60637 773-702-8430 SUMMARY OF PROPOSED PROJECT For language learning and instruction, few resources are more crucial than dictionaries. This project aims to make high-quality dictionaries in each of the twenty-six modern literary languages of South Asia universally available in digital formats. At least thirty-two dictionaries will be converted from printed books, often multi-volume, to electronic resources. A wide variety of users will benefit from access to electronic dictionaries via global media such as the World Wide Web. Not only the academics whose study of Indic languages has long been supported by the Department of Education, but also American-born learners of South Asian heritage, and individuals around the world will profit. A well-developed plan and the considerable experience of key personnel ensure that the project's objectives will be met. The Project Director and two Co-Directors have been at the forefront of recent initiatives to improve global access to South Asian materials through deployment of current technologies.
    [Show full text]
  • Program - Session Descriptions
    Program - Session Descriptions Monday, October 18, 2010 09:00-12:30 MORNING TUTORIALS Presenter: Track 1: An Introduction to Writing Systems & Unicode Richard Ishida The tutorial will provide you with a good understanding of the many unique Hotel cut-off: Internationalization characteristics of non-Latin writing systems, and illustrate the problems involved in 09/26/2010 Lead, implementing such scripts in products. It does not provide detailed coding advice, but W3C does provide the essential background information you need to understand the Venue: fundamental issues related to Unicode deployment, across a wide range of scripts. It has also proved to be an excellent orientation for newcomers to the conference, providing Hyatt Regency the background needed to assist understanding of the other talks! The tutorial goes Hotel beyond encoding issues to discuss characteristics related to input of ideographs, Santa Clara, CA combining characters, context-dependent shape variation, text direction, vowel signs, USA ligatures, punctuation, wrapping and editing, font issues, sorting and indexing, keyboards, and more. The concepts are introduced through the use of examples from Chinese, Japanese, Korean, Arabic, Hebrew, Thai, Hindi/Tamil, Russian and Greek. While the tutorial is perfectly accessible to beginners, it has also attracted very good reviews from people at an intermediate and advanced level, due to the breadth of scripts discussed. No prior knowledge is needed. Presenter: Track 2: Internationalization: An Introduction, Part I: Characters and Character Encodings Addison Phillips Globalization What is internationalization? What do developers, product managers, or quality engineers Architect need to know about it? How does a software development organization incorporate Lab126 (Amazon) internationalization into the design, implementation, and delivery of an application? This tutorial track provides an introduction to the topics of internationalization, localization and globalization.
    [Show full text]
  • Swahili Vocabulary Dictionary 427
    Contents Foreword v Acknowledgments and Dedication vii 1. About the Swahili Language 1 2. The Alphabet, Pronunciation, and Common Mistakes 7 3. Personal Subject Prefixes, Personal Pronouns and Their Negations 15 4. Swahili Greetings 23 5. Present and Future Tenses and Their Negations 37 6. Simple Past and Past Perfect Tenses and Their Negations 47 7. The Swahili Noun Class System: M-/WA- and M-/MI- 57 8. Swahili Noun Classes: JI-/MA- Class and KI-/VI- Class 69 9. Swahili Noun Classes: N- and U- 81 10. Swahili Noun Classes: PA- and KU- and Noun Class Agreement 91 11. Object Infixes 101 12. Possessives 111 13. Adjectives 125 14. Demonstratives 139 15. Comparatives and Superlatives 151 16. Question Words, Phrases and Statements 161 17. The Verbs ‘To Be,’ ‘To Have’ and ‘To Be in a Place’ 173 18. Numbers 183 19. More About Swahili Numbers 193 20. Telling the Time in Swahili 207 21. Days, Months, and Dates in Swahili 219 Copyright © 2014. UPA. All rights reserved. © 2014. UPA. All rights Copyright 22. Adverbs 229 23. Passive Form of the Verb 241 24. Stative Form of the Verb 255 Almasi, Oswald, et al. <i>Swahili Grammar for Introductory and Intermediate Levels : Sarufi ya Kiswahili cha Ngazi ya Kwanza na Kati</i>, UPA, 2014. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/hselibrary-ebooks/detail.action?docID=1810394. Created from hselibrary-ebooks on 2019-06-09 04:05:13. iv Contents 25. Causative Form of the Verb 267 26. Prepositional Form of the Verb 277 27. Reciprocal Form of the Verb 287 28.
    [Show full text]
  • Fifty Years of Kiswahili in Regional and International Development
    Fifty Years of Kiswahili in Regional and International Development by Susan Chebet-Choge, M.Phil. [email protected] Lecturer, Department of Language and Literature Education Masinde Muliro University of Science & Technology, Kenya Kakamega, Kenya Abstract Kiswahili is undoubtedly one of the most developed and expansively used indigenous African languages nationally and internationally. At the dawn of African states political independence, the founding fathers of the nations led by Kwame Nkrumah considered Kiswahili as an appropriate language for African states unity. Adoption of Kiswahili as the universal language of African continent could have gone a long way in realising the dream of the founding fathers of one people, one nation, one language. However, as history bears witness, their dream remained just a wish. On the contrary, Kiswahili, though not accorded Africa continent political recognition, has continued with its linguistic conquest and expansion further from its indigenous base in the East Africa’s coast to various countries in Africa and beyond. The status and usage of Kiswahili has shifted and grown with the political, social and economic growth of nations which use it for various purposes. Currently, it is a regional language in East African countries where it wears several hats as a vernacular, national & official language, lingua franca and a vehicular in various spheres of life. Internationally, Kiswahili has curved for itself a linguistic sphere in the field of academia and international communication. This paper therefore seeks to document and asses Kiswahili’s participation in the last fifty years in national, regional and international developments. 172 The Journal of Pan African Studies , vol.4, no.10, January 2012 Introduction Kiswahili is a language of the Niger-Congo family which Ethnologue has classified as ISO 639- 3: SWA (Lewis 2009).
    [Show full text]
  • Proceeding of the Ugra Global Expert Meeting on Multilingualism In
    Government of the Khanty-Mansi Autonomous Okrug – Ugra Commission of the Russian Federation for UNESCO Russian Committee of the UNESCO Information for All Programme Interregional Library Cooperation Centre MAAYA World Network for Linguistic Diversity Multilingualism in Cyberspace Proceedings of the Ugra Global Expert Meeting (Khanty-Mansiysk, Russian Federation, 4–9 July, 2015) Moscow 2016 Financial support for this publication is provided by the Government of the Khanty-Mansi Autonomous Okrug – Ugra, the Russian Committee of the UNESCO Information for All Programme and the Interregional Library Cooperation Centre Compilers: Evgeny Kuzmin, Anastasia Parshakova, Daria Ignatova Translators: Tatiana Butkova and Ekaterina Komarova English text edited by Anastasia Parshakova Multilingualism in Cyberspace. Proceedings of the Ugra Global Expert Meeting (Khanty-Mansiysk, Russian Federation, 4–9 July, 2015). – Moscow: Interregional Library Cooperation Centre, 2016. – 304 p. The book includes papers by the participants of the Ugra Global Expert Meeting on Multilingualism in Cyberspace (Khanty-Mansiysk, Russian Federation, 4–9 July, 2015), aimed to discuss policies in the field of language preservation, measures to be taken at both national and international level to develop linguistic and cultural diversity of the world and promote multilingualism in cyberspace. The authors are responsible for the choice and presentation of facts and for the opinions expressed, which are not necessarily those of the compilers. ISBN 978-5-91515-068-9 © Interregional
    [Show full text]
  • CCURL 2014: Collaboration and Computing for Under- Resourced Languages in the Linked Open Data Era
    CCURL 2014: Collaboration and Computing for Under- Resourced Languages in the Linked Open Data Era Workshop Programme 09:15-09:30 – Welcome and Introduction 09:30-10:30 – Invited Talk Steven Moran, Under-resourced languages data: from collection to application 10:30-11:00 – Coffee break 11:00-13:00 – Session 1 Chairperson: Joseph Mariani 11:00-11:30 – Oleg Kapanadze, The Multilingual GRUG Parallel Treebank – Syntactic Annotation for Under-Resourced Languages 11:30-12:00 – Martin Benjamin, Paula Radetzky, Multilingual Lexicography with a Focus on Less- Resourced Languages: Data Mining, Expert Input, Crowdsourcing, and Gamification 12:00-12:30 – Thierry Declerck, Eveline Wandl-Vogt, Karlheinz Mörth, Claudia Resch, Towards a Unified Approach for Publishing Regional and Historical Language Resources on the Linked Data Framework 12:30-13:00 – Delphine Bernhard, Adding Dialectal Lexicalisations to Linked Open Data Resources: the Example of Alsatian 13:00-15:00 – Lunch break 13:00-15:00 – Poster Session Chairpersons: Laurette Pretorius and Claudia Soria Georg Rehm, Hans Uszkoreit, Ido Dagan, Vartkes Goetcherian, Mehmet Ugur Dogan, Coskun Mermer, Tamás Varadi, Sabine Kirchmeier-Andersen, Gerhard Stickel, Meirion Prys Jones, Stefan Oeter, Sigve Gramstad, An Update and Extension of the META-NET Study “Europe’s Languages in the Digital Age” István Endrédy, Hungarian-Somali-English Online Dictionary and Taxonomy Chantal Enguehard, Mathieu Mangeot, Computerization of African Languages-French Dictionaries Uwe Quasthoff, Sonja Bosch, Dirk Goldhahn, Morphological Analysis for Less- Resourced Languages: Maximum Affix Overlap Applied to Zulu Edward O. Ombui, Peter W. Wagacha, Wanjiku Ng’ang’a, InterlinguaPlus Machine Translation Approach for Under-Resourced Languages: Ekegusii & Swahili Ronaldo Martins, UNLarium: a Crowd-Sourcing Environment for Multilingual Resources Anuschka van ´t Hooft, José Luis González Compeán, Collaborative Language Documentation: the Construction of the Huastec Corpus Sjur Moshagen, Jack Rueter, Tommi Pirinen, Trond Trosterud, Francis M.
    [Show full text]
  • Indirect Influence of English on Kiswahili: the Case of Multiword Duplicates Between Kiswahili and English
    Indirect Influence of English on Kiswahili: The Case of Multiword Duplicates between Kiswahili and English Dissertation zur Erlangung des akademischen Grades doctor philosophiae (Dr. phil.) vorgelegt an der Philosophischen Fakultät der Technischen Universität Chemnitz von Herrn Dunlop Ochieng, geboren am 15.11. 1973 in Rorya, Tanzania Chemnitz, den 4 Februar 2015 Ochieng ii Acknowledgements I owe a debt of gratitude to many kind individuals and supportive institutions for their support during the course of my study. First, I sincerely thank the German Academic Exchange Service and Tanzania Ministry of Education and Vocational Training for their joint scholarship that financed my German language course at the Eurasia Language Institute in Berlin and my PhD project at Technischen Universit ät Chemnitz. Secondly, I thank the Open University of Tanzania, my employer, for granting me a study leave to pursue my PhD in Germany, and for partly funding my study. My appreciation equally goes to the University of Helsinki in Finland for granting me a permission to access and use their online Kiswahili corpus, the Helsinki Corpus of Swahili for my research. Ac- cess to the corpus enabled me to compare and contrast the general tendencies and fre- quencies of my research tokens in my recent corpus, the Chemnitz Corpus of Swahili and their old corpus, the Helsinki Corpus of Swahili. I am equally grateful to Prof. Daniel Nkemleke, for his guidance on how to analyze my research data, and how to present them meaningfully and logically. He also helped me in proofreading my final drafts of thesis to ensure that it meets the standard of a PhD thesis in linguistics.
    [Show full text]
  • Kamusi Ya Kiswahili Free Download Pdf
    Kamusi ya kiswahili free download pdf Continue Kamusi Kuu I Kiswahili Android app is a unique digital product Longhorn Publishers Limited in partnership with BAKITA. The app has been integrated with a variety of functionality and interactive content to meet the needs of end users. For example, all words were classified in eight (8) categories of Kiswahily. Groups such as tosy, misemo and nahau have also been segregated and classified separately to maximize the use of the product. Free18.87 MB Continue the Kamusi Kuu app I Kiswahili Android app is a unique digital product Longhorn Publishers Limited in partnership with BAKITA. The app has been integrated with a variety of functionality and interactive content to meet the needs of end users. For example, all words were classified in eight (8) categories of Kiswahily. Groups such as tosy, misemo and nahau have also been segregated and classified separately to maximize the use of the product. Kamusi Kuu ya Kiswahili is a free software application from the Training and Learning Tools subcategory, in one category or another, Education. The app is currently available in English and it was last updated at 2020-08-06. The program can be installed on Android. Kamusi Kuu ya Kiswahili (version 4.55) has a file size of 18.87 MB and is available for download from our website. Just click the green download button above to start. So far, the program has been downloaded 10,165 times. We've already checked that the download link will be secure, but for your own protection, we recommend that you scan the downloaded software with your antivirus.
    [Show full text]
  • Virtual Language Learning: Finding Gems Amongst the Pebbles. INSTITUTION Monash Univ., Clayton, Victoria (Australia)
    DOCUMENT RESUME ED 436 967 FL 026 090 AUTHOR Felix, Uschi TITLE Virtual Language Learning: Finding Gems Amongst the Pebbles. INSTITUTION Monash Univ., Clayton, Victoria (Australia). SPONS AGENCY Language Australia, Melbourne (Victoria). ISBN ISBN-1-875578-88-9 PUB DATE 1998-00-00 NOTE 176p.; A CD-ROM accompanies the book. AVAILABLE FROM Publications and Clearinghouse Manager, Language Australia Ltd., GPO Box 372F, Melbourne, VIC 3001, Australia. PUB TYPE Guides Non-Classroom (055) Reference Materials - Bibliographies (131) EDRS PRICE MF01/PC08 Plus Postage. DESCRIPTORS Adult Education; *Computer Assisted Instruction; *Computer Mediated Communication; Computer Networks; *Computer Uses in Education; Educational Resources; Elementary Secondary Education; Independent Study; Information Dissemination; Information Networks; *Instructional Materials; Internet; Listservs; Second Language Instruction; Second Language Learning; *World Wide Web ABSTRACT This document is intended for teachers and students who want to use the resources offered on the World Wide Web for second language instruction and learning. It assists in a comprehensive exploration of resources with different types of delivery, content, and level of interaction. The book is free of technical jargon, making insight into the complexity of the task all the easier for the user who is not technically-inclined. Resources include the following: comprehensive lists of URLs in a large number of languages, typically including a wide range of links; comprehensive lists of URLs for a single language, also extensively linked; and sites that focus on a single language. Selection of resources has focused on resources that have the potential to be integrated into existing courses; are instantly usable, in some cases without a teacher; are free or available at a reasonable cost; and are substantial and/or provide useful self-contained activities.
    [Show full text]
  • 1 Fee-Alexandra Haase Conceptions of Criticism Cross-Cultural
    Fee-Alexandra Haase Conceptions of Criticism Cross-Cultural, Interdisciplinary, and Historical Studies of Structures of a Concept of Values 1 Index Introduction ..................................................................................................................3 0. „Metacriticism“ - On Definitions of Terms of Criticism ................................3 1. On the Development of Criticism in Europe.................................................14 1.1. „Krisis“ - On Ancient Criticism.....................................................................14 1.2. „Synthesis“ - On Medieval Criticism.............................................................29 1.3. „Critica“- On Criticism in the New Time......................................................41 1.4. "Ratio"- On Criticism in the 18th and 19th Century ... ………….…........….59 1.5. “Modern and Postmodern” - On Criticism in the 20th Century...................89 2. On the Development of Criticism on Continents .........................................103 2.1. „Naqd“ - On Criticism in the Middle East...................................................103 2.2. "Tenqid"- On Criticism in non-Arabic cultures, Middle East and C. Asia127 2.3. “Viveka” - On Indian Criticism...................................................................149 2.4. “Criticism and Orature” - On African Criticism........................................169 2.5. “Pi Ping” - On Asian Criticism....................................................................188 2.6. “Cricismo and Criticism” -On Criticism
    [Show full text]
  • Towards an Alliance for Digital Language Diversity (CCURL 2016
    LREC 2016 Workshop CCURL 2016 Collaboration and Computing for Under-Resourced Languages: Towards an Alliance for Digital Language Diversity 23 May 2016 PROCEEDINGS Editors Claudia Soria, Laurette Pretorius, Thierry Declerck, Joseph Mariani, Kevin Scannell, Eveline Wandl-Vogt Workshop Programme Opening Session 09.15 – 09.30 Introduction 09.30 – 10.30 Jon French, Oxford Global Languages: a Defining Project (Invited Talk) 10.30 – 11.00 Coffee Break Session 1 11.00 – 11.25 Antti Arppe, Jordan Lachler, Trond Trosterud, Lene Antonsen, and Sjur N. Moshagen, Basic Language Resource Kits for Endangered Languages: A Case Study of Plains Cree 11.25 – 11.50 George Dueñas and Diego Gómez, Building Bilingual Dictionaries for Minority and Endangered Languages with Mediawiki 11.50 – 12.15 Dorothee Beermann, Tormod Haugland, Lars Hellan, Uwe Quasthoff, Thomas Eckart, and Christoph Kuras, Quantitative and Qualitative Analysis in the Work with African Languages 12.15 – 12.40 Nikki Adams and Michael Maxwell, Somali Spelling Corrector and Morphological Analyzer 12.40 – 14.00 Lunch Break Session 2 14.00 – 14.25 Delyth Prys, Mared Roberts, and Gruffudd Prys, Reprinting Scholarly Works as e- Books for Under-Resourced Languages 14.25 – 14.50 Cat Kutay, Supporting Language Teaching Online 14.50 – 15.15 Maik Gibson, Assessing Digital Vitality: Analytical and Activist Approaches 15.15 – 15.40 Martin Benjamin, Digital Language Diversity: Seeking the Value Proposition 15.40 – 16.00 Discussion 16.05 – 16.30 Coffee Break 16.30 – 17.30 Poster Session Sebastian Stüker,
    [Show full text]
  • 2012 Resource Directory and Editorial Index 2011
    Language | Technology | Business RESOURCE ANNUAL DIRECTORY EDITORIAL ANNUAL INDEX 2011 MultilLingual reader survey results The importance of content inventories The language services market: A year in review 1 CoverResourceDirectoryRD12Vers2.indd 1 1/12/12 8:52 AM 2-3 Ad-About RD11.indd 2 1/12/12 8:58 AM About the MultiLingual 2012 Resource Directory and Editorial Index 2011 his volume marks our tenth annual Index and Resource Directory. We’ve been Up Front doing this for a decade now — or more, depending on how you’re counting, since the very first iteration of MultiLingual was actually a printed-out list of localiza- tion resources mailed in 1987. But we’ve streamlined things since then, and every T year now, we collect up-to-date data on language companies and services, and compile them into listings by category — from Software Testing to Interpreting to Blogs. The purpose is to provide a resource for the language industry; what amounts to a global phone book for those seeking this kind of business. The Resource Directory has tradition- ally been marked by blue tabs, and this issue is no exception. We include a few short articles with each edition, organized with red tabs, and this year, Kate Edwards writes on the importance of content inventories. Common Sense Advisory provides a breakdown of the worldwide market, and we have a snapshot of our own reader survey, which helped us to get a better picture of how to serve the industry. Next comes the Index, marked with gold tabs. We manually index every issue, going through the articles in their final state and creating a list of authors, titles and topics, which we then sort by a single alphabet at the end of the year.
    [Show full text]