Towards an Alliance for Digital Language Diversity (CCURL 2016
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
The Indian Origin of Romani People As a Founding Myth in Eastern European Museums Douglas Neander Sambati1
The Indian Origin of Romani people as a Founding Myth 39 Douglas Sambati https://doi.org/10.36572/csm.2019.vol.58.03 The Indian Origin of Romani people as a Founding Myth in Eastern European Museums Douglas Neander Sambati1 Introduction This article analyses three museums – the Muzeum Romské Kultury (MRK) in Brno/Czech Republic, the Muzej Romské Kulture (MRKu) in Belgrade/Serbia, and the Roma Ethnographic Museum (REM) in Tárnow/Poland – and part of its exhibitions which can be considered as elements of the Romani Nationalism. The main objective is to demonstrate how these museum institutions support a broad narrative about a common Indian origin of Gypsy/Romani populations. It will be discussed below how the aforementioned museums – by means of their exhibitions, websites, events or any other kind of official production – support sets of representations which allow a formation of an umbrella rhetoric about the group known, taken and self-ascribed as Gypsies and/or Roma. This discourse, then, is able to shelter all different groups within this population in a holistic manner, based on a narrative formed by essentializations, exoticizations and generalizations. Therefore, the argument will be developed from now onwards by establishing a dialogue of elements which essentialize – in the sense that they imply natural and intrinsic characteristics – some aspects of Roma history and 1 Grupo de Pesquisa Estudos Interdisciplinares de Patrimônio Cultural da UNIVILLE. [email protected] 40 Cadernos de Sociomuseologia, 2019.vol.58.nº14 culture. In plain words, it is possible to find within these museums some specific deliberation which sustains none (or little) doubt that some characteristics are part of a claimed Romani way of life. -
Mushrooms Russia and History
MUSHROOMS RUSSIA AND HISTORY BY VALENTINA PAVLOVNA WASSON AND R.GORDON WASSON VOLUME I PANTHEON BOOKS • NEW YORK COPYRIGHT © 1957 BY R. GORDON WASSON MANUFACTURED IN ITALY FOR THE AUTHORS AND PANTHEON BOOKS INC. 333, SIXTH AVENUE, NEW YORK 14, N. Y. www.NewAlexandria.org/ archive CONTENTS LIST OF PLATES VII LIST OF ILLUSTRATIONS IN THE TEXT XIII PREFACE XVII VOLUME I I. MUSHROOMS AND THE RUSSIANS 3 II. MUSHROOMS AND THE ENGLISH 19 III. MUSHROOMS AND HISTORY 37 IV. MUSHROOMS FOR MURDERERS 47 V. THE RIDDLE OF THE TOAD AND OTHER SECRETS MUSHROOMIC 65 1. The Venomous Toad 66 2. Basques and Slovaks 77 3. The Cripple, the Toad, and the Devil's Bread 80 4. The 'Pogge Cluster 92 5. Puff balls, Filth, and Vermin 97 6. The Sponge Cluster 105 7. Punk, Fire, and Love 112 8. The Gourd Cluster 127 9. From 'Panggo' to 'Pupik' 138 10. Mucus, Mushrooms, and Love 145 11. The Secrets of the Truffle 166 12. 'Gripau' and 'Crib' 185 13. The Flies in the Amanita 190 v CONTENTS VOLUME II V. THE RIDDLE OF THE TOAD AND OTHER SECRETS MUSHROOMIC (CONTINUED) 14. Teo-Nandcatl: the Sacred Mushrooms of the Nahua 215 15. Teo-Nandcatl: the Mushroom Agape 287 16. The Divine Mushroom: Archeological Clues in the Valley of Mexico 322 17. 'Gama no Koshikake and 'Hegba Mboddo' 330 18. The Anatomy of Mycophobia 335 19. Mushrooms in Art 351 20. Unscientific Nomenclature 364 Vale 374 BIBLIOGRAPHICAL NOTES AND ACKNOWLEDGEMENTS 381 APPENDIX I: Mushrooms in Tolstoy's 'Anna Karenina 391 APPENDIX II: Aksakov's 'Remarks and Observations of a Mushroom Hunter' 394 APPENDIX III: Leuba's 'Hymn to the Morel' 400 APPENDIX IV: Hallucinogenic Mushrooms: Early Mexican Sources 404 INDEX OF FUNGAL METAPHORS AND SEMANTIC ASSOCIATIONS 411 INDEX OF MUSHROOM NAMES 414 INDEX OF PERSONS AND PLACES 421 VI LIST OF PLATES VOLUME I JEAN-HENRI FABRE. -
Automatic Correction of Real-Word Errors in Spanish Clinical Texts
sensors Article Automatic Correction of Real-Word Errors in Spanish Clinical Texts Daniel Bravo-Candel 1,Jésica López-Hernández 1, José Antonio García-Díaz 1 , Fernando Molina-Molina 2 and Francisco García-Sánchez 1,* 1 Department of Informatics and Systems, Faculty of Computer Science, Campus de Espinardo, University of Murcia, 30100 Murcia, Spain; [email protected] (D.B.-C.); [email protected] (J.L.-H.); [email protected] (J.A.G.-D.) 2 VÓCALI Sistemas Inteligentes S.L., 30100 Murcia, Spain; [email protected] * Correspondence: [email protected]; Tel.: +34-86888-8107 Abstract: Real-word errors are characterized by being actual terms in the dictionary. By providing context, real-word errors are detected. Traditional methods to detect and correct such errors are mostly based on counting the frequency of short word sequences in a corpus. Then, the probability of a word being a real-word error is computed. On the other hand, state-of-the-art approaches make use of deep learning models to learn context by extracting semantic features from text. In this work, a deep learning model were implemented for correcting real-word errors in clinical text. Specifically, a Seq2seq Neural Machine Translation Model mapped erroneous sentences to correct them. For that, different types of error were generated in correct sentences by using rules. Different Seq2seq models were trained and evaluated on two corpora: the Wikicorpus and a collection of three clinical datasets. The medicine corpus was much smaller than the Wikicorpus due to privacy issues when dealing Citation: Bravo-Candel, D.; López-Hernández, J.; García-Díaz, with patient information. -
Some Principles of the Use of Macro-Areas Language Dynamics &A
Online Appendix for Harald Hammarstr¨om& Mark Donohue (2014) Some Principles of the Use of Macro-Areas Language Dynamics & Change Harald Hammarstr¨om& Mark Donohue The following document lists the languages of the world and their as- signment to the macro-areas described in the main body of the paper as well as the WALS macro-area for languages featured in the WALS 2005 edi- tion. 7160 languages are included, which represent all languages for which we had coordinates available1. Every language is given with its ISO-639-3 code (if it has one) for proper identification. The mapping between WALS languages and ISO-codes was done by using the mapping downloadable from the 2011 online WALS edition2 (because a number of errors in the mapping were corrected for the 2011 edition). 38 WALS languages are not given an ISO-code in the 2011 mapping, 36 of these have been assigned their appropri- ate iso-code based on the sources the WALS lists for the respective language. This was not possible for Tasmanian (WALS-code: tsm) because the WALS mixes data from very different Tasmanian languages and for Kualan (WALS- code: kua) because no source is given. 17 WALS-languages were assigned ISO-codes which have subsequently been retired { these have been assigned their appropriate updated ISO-code. In many cases, a WALS-language is mapped to several ISO-codes. As this has no bearing for the assignment to macro-areas, multiple mappings have been retained. 1There are another couple of hundred languages which are attested but for which our database currently lacks coordinates. -
Prioritizing African Languages: Challenges to Macro-Level Planning for Resourcing and Capacity Building
Prioritizing African Languages: Challenges to macro-level planning for resourcing and capacity building Tristan M. Purvis Christopher R. Green Gregory K. Iverson University of Maryland Center for Advanced Study of Language Abstract This paper addresses key considerations and challenges involved in the process of prioritizing languages in an area of high linguistic di- versity like Africa alongside other world regions. The paper identifies general considerations that must be taken into account in this process and reviews the placement of African languages on priority lists over the years and across different agencies and organizations. An outline of factors is presented that is used when organizing resources and planning research on African languages that categorizes major or crit- ical languages within a framework that allows for broad coverage of the full linguistic diversity of the continent. Keywords: language prioritization, African languages, capacity building, language diversity, language documentation When building language capacity on an individual or localized level, the question of which languages matter most is relatively less complicated than it is for those planning and providing for language capabilities at the macro level. An American anthropology student working with Sierra Leonean refugees in Forecariah, Guinea, for ex- ample, will likely know how to address and balance needs for lan- guage skills in French, Susu, Krio, and a set of other languages such as Temne and Mandinka. An education official or activist in Mwanza, Tanzania, will be concerned primarily with English, Swahili, and Su- kuma. An administrator of a grant program for Less Commonly Taught Languages, or LCTLs, or a newly appointed language authori- ty for the United States Department of Education, Department of Commerce, or U.S. -
Welsh Language Technology Action Plan Progress Report 2020 Welsh Language Technology Action Plan: Progress Report 2020
Welsh language technology action plan Progress report 2020 Welsh language technology action plan: Progress report 2020 Audience All those interested in ensuring that the Welsh language thrives digitally. Overview This report reviews progress with work packages of the Welsh Government’s Welsh language technology action plan between its October 2018 publication and the end of 2020. The Welsh language technology action plan derives from the Welsh Government’s strategy Cymraeg 2050: A million Welsh speakers (2017). Its aim is to plan technological developments to ensure that the Welsh language can be used in a wide variety of contexts, be that by using voice, keyboard or other means of human-computer interaction. Action required For information. Further information Enquiries about this document should be directed to: Welsh Language Division Welsh Government Cathays Park Cardiff CF10 3NQ e-mail: [email protected] @cymraeg Facebook/Cymraeg Additional copies This document can be accessed from gov.wales Related documents Prosperity for All: the national strategy (2017); Education in Wales: Our national mission, Action plan 2017–21 (2017); Cymraeg 2050: A million Welsh speakers (2017); Cymraeg 2050: A million Welsh speakers, Work programme 2017–21 (2017); Welsh language technology action plan (2018); Welsh-language Technology and Digital Media Action Plan (2013); Technology, Websites and Software: Welsh Language Considerations (Welsh Language Commissioner, 2016) Mae’r ddogfen yma hefyd ar gael yn Gymraeg. This document is also available in Welsh. -
Introduction
Introduction Mahagama Sekera (1929–76) was a Sinhalese lyricist and poet from Sri Lanka. In 1966 Sekera gave a lecture in which he argued that a test of a good song was to take away the music and see whether the lyric could stand on its own as a piece of literature.1 Here I have translated the Sinhala-language song Sekera presented as one that aced the test.2 The subject of this composition, like the themes of many songs broadcast on Sri Lanka’s radio since the late 1930s, was related to Buddhism, the religion of the country’s majority. The Niranjana River Flowed slowly along the sandy plains The day the Buddha reached enlightenment. The Chief of the Three Worlds attained samadhi in meditation. He was liberated at that moment. In the cool shade of the snowy mountain ranges The flowers’ fragrant pollen Wafted through the sandalwood trees Mixed with the soft wind And floated on. When the leaves and sprouts Of the great Bodhi tree shook slightly The seven musical notes rang out. A beautiful song came alive Moving to the tāla. 1 2 Introduction The day the Venerable Sanghamitta Brought the branch of the Bodhi tree to Mahamevuna Park The leaves of the Bodhi tree danced As if there was such a thing as a “Mahabō Vannama.”3 The writer of this song is Chandrarathna Manawasinghe (1913–64).4 In the Sinhala language he is credited as the gīta racakayā (lyricist). Manawasinghe alludes in the text to two Buddhist legends and a Sinhalese style of dance. -
Learning to Read by Spelling Towards Unsupervised Text Recognition
Learning to Read by Spelling Towards Unsupervised Text Recognition Ankush Gupta Andrea Vedaldi Andrew Zisserman Visual Geometry Group Visual Geometry Group Visual Geometry Group University of Oxford University of Oxford University of Oxford [email protected] [email protected] [email protected] tttttttttttttttttttttttttttttttttttttttttttrssssss ttttttt ny nytt nr nrttttttt ny ntt nrttttttt zzzz iterations iterations bcorote to whol th ticunthss tidio tiostolonzzzz trougfht to ferr oy disectins it has dicomered Training brought to view by dissection it was discovered Figure 1: Text recognition from unaligned data. We present a method for recognising text in images without using any labelled data. This is achieved by learning to align the statistics of the predicted text strings, against the statistics of valid text strings sampled from a corpus. The figure above visualises the transcriptions as various characters are learnt through the training iterations. The model firstlearns the concept of {space}, and hence, learns to segment the string into words; followed by common words like {to, it}, and only later learns to correctly map the less frequent characters like {v, w}. The last transcription also corresponds to the ground-truth (punctuations are not modelled). The colour bar on the right indicates the accuracy (darker means higher accuracy). ABSTRACT 1 INTRODUCTION This work presents a method for visual text recognition without read (ri:d) verb • Look at and comprehend the meaning of using any paired supervisory data. We formulate the text recogni- (written or printed matter) by interpreting the characters or tion task as one of aligning the conditional distribution of strings symbols of which it is composed. -
Modern Contours: Sinhala Poetry in Sri Lanka, 1913-56
South Asia: Journal of South Asian Studies ISSN: 0085-6401 (Print) 1479-0270 (Online) Journal homepage: http://www.tandfonline.com/loi/csas20 Modern Contours: Sinhala Poetry in Sri Lanka, 1913–56 Garrett M. Field To cite this article: Garrett M. Field (2016): Modern Contours: Sinhala Poetry in Sri Lanka, 1913–56, South Asia: Journal of South Asian Studies, DOI: 10.1080/00856401.2016.1152436 To link to this article: http://dx.doi.org/10.1080/00856401.2016.1152436 Published online: 12 Apr 2016. Submit your article to this journal View related articles View Crossmark data Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=csas20 Download by: [Garrett Field] Date: 13 April 2016, At: 04:41 SOUTH ASIA: JOURNAL OF SOUTH ASIAN STUDIES, 2016 http://dx.doi.org/10.1080/00856401.2016.1152436 ARTICLE Modern Contours: Sinhala Poetry in Sri Lanka, 1913À56 Garrett M. Field Ohio University, Athens, OH, USA ABSTRACT KEYWORDS A consensus is growing among scholars of modern Indian literature Modernist realism; that the thematic development of Hindi, Urdu and Bangla poetry Rabindranath Tagore; was consistent to a considerable extent. I use the term ‘consistent’ romanticism; Sinhala poetry; to refer to the transitions between 1900 and 1960 from didacticism Siri Gunasinghe; South Asian literary history; Sri Lanka; to romanticism to modernist realism. The purpose of this article is to superposition build upon this consensus by revealing that as far south as Sri Lanka, Sinhala-language poetry developed along the same trajectory. To bear out this argument, I explore the works of four Sri Lankan poets, analysing the didacticism of Ananda Rajakaruna, the romanticism of P.B. -
The Malayic-Speaking Orang Laut Dialects and Directions for Research
KARLWacana ANDERBECK Vol. 14 No., The 2 Malayic-speaking(October 2012): 265–312Orang Laut 265 The Malayic-speaking Orang Laut Dialects and directions for research KARL ANDERBECK Abstract Southeast Asia is home to many distinct groups of sea nomads, some of which are known collectively as Orang (Suku) Laut. Those located between Sumatra and the Malay Peninsula are all Malayic-speaking. Information about their speech is paltry and scattered; while starting points are provided in publications such as Skeat and Blagden (1906), Kähler (1946a, b, 1960), Sopher (1977: 178–180), Kadir et al. (1986), Stokhof (1987), and Collins (1988, 1995), a comprehensive account and description of Malayic Sea Tribe lects has not been provided to date. This study brings together disparate sources, including a bit of original research, to sketch a unified linguistic picture and point the way for further investigation. While much is still unknown, this paper demonstrates relationships within and between individual Sea Tribe varieties and neighbouring canonical Malay lects. It is proposed that Sea Tribe lects can be assigned to four groupings: Kedah, Riau Islands, Duano, and Sekak. Keywords Malay, Malayic, Orang Laut, Suku Laut, Sea Tribes, sea nomads, dialectology, historical linguistics, language vitality, endangerment, Skeat and Blagden, Holle. 1 Introduction Sometime in the tenth century AD, a pair of ships follows the monsoons to the southeast coast of Sumatra. Their desire: to trade for its famed aromatic resins and gold. Threading their way through the numerous straits, the ships’ path is a dangerous one, filled with rocky shoals and lurking raiders. Only one vessel reaches its destination. -
Corpus Collection for Under-Resourced Languages with More Than One Million Speakers
Corpus collection for under-resourced languages with more than one million speakers Dirk Goldhahn1, Maciej Sumalvico1, Uwe Quasthoff1,2 Natural Language Processing Group, University of Leipzig, Germany Department of African Languages, University of South Africa, South Africa Email: { dgoldhahn, janicki, quasthoff, }@informatik.uni-leipzig.de Abstract For only 40 of about 350 languages with more than one million speakers, the situation concerning text resources is comfortable. For the remaining languages, the number of speakers indicates a need for both corpora and tools. This paper describes a corpus collection initiative for these languages. While random Web crawling has serious limitations, native speakers with knowledge of web pages in their language are of invaluable help. The aim is to give interested scholars, language enthusiasts the possibility to contribute to corpus creation or extension by simply entering a URL into the Web Interface. Using a Web portal URLs of interest are collected with the help of the respective communities. A standardized corpus processing chain for daily newspaper corpora creation is adapted to append newly added web pages to an increasing corpus. As a result we will be able to collect larger corpora for under-resourced languages by a community effort. These corpora will be made publicly available. Keywords: corpora, under-resourced languages, Web portal, community 1. Introduction links require the execution of JavaScript code by the There are about 350 languages with more than one million crawler which often produces errors. So, if a website speakers1. For about 40 of them, the situation concerning heavily uses JavaScript links and there are no other links text resources is comfortable: there are corpora of pointing to special pages (coming from another website, reasonable size and also tools like POS taggers adapted to for instance), then all but the main page might be these languages. -
Digital Dictionaries of South Asia Funded by the U.S Department of Education Under Title VI, Section 605, October 1999 Through September 2002
Digital Dictionaries of South Asia Funded by the U.S Department of Education under Title VI, Section 605, October 1999 through September 2002 PROGRAM TITLE OF PROJECT International Research and Studies Program, Digital Dictionaries of South Asia International Education and Graduate Programs Service, U.S. Department of APPLICANT Education, CFDA No. 84.017A The University of Chicago 970 East 58th Street FUNDING REQUESTED Chicago, Illinois 60637 $453,071 for three years 773-702-8602 PROJECT DATES PROJECT DIRECTOR Oct. 1, 1999 - Sept. 30, 2002 James H. Nye Director, South Asia Language and Area Center 5848 University Avenue, Kelly 313 Project Director’s Signature Chicago, Illinois 60637 773-702-8430 SUMMARY OF PROPOSED PROJECT For language learning and instruction, few resources are more crucial than dictionaries. This project aims to make high-quality dictionaries in each of the twenty-six modern literary languages of South Asia universally available in digital formats. At least thirty-two dictionaries will be converted from printed books, often multi-volume, to electronic resources. A wide variety of users will benefit from access to electronic dictionaries via global media such as the World Wide Web. Not only the academics whose study of Indic languages has long been supported by the Department of Education, but also American-born learners of South Asian heritage, and individuals around the world will profit. A well-developed plan and the considerable experience of key personnel ensure that the project's objectives will be met. The Project Director and two Co-Directors have been at the forefront of recent initiatives to improve global access to South Asian materials through deployment of current technologies.