World Languages Using Latin Script
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Compilation of a Swiss German Dialect Corpus and Its Application to Pos Tagging
Compilation of a Swiss German Dialect Corpus and its Application to PoS Tagging Nora Hollenstein Noemi¨ Aepli University of Zurich University of Zurich [email protected] [email protected] Abstract Swiss German is a dialect continuum whose dialects are very different from Standard German, the official language of the German part of Switzerland. However, dealing with Swiss German in natural language processing, usually the detour through Standard German is taken. As writing in Swiss German has become more and more popular in recent years, we would like to provide data to serve as a stepping stone to automatically process the dialects. We compiled NOAH’s Corpus of Swiss German Dialects consisting of various text genres, manually annotated with Part-of- Speech tags. Furthermore, we applied this corpus as training set to a statistical Part-of-Speech tagger and achieved an accuracy of 90.62%. 1 Introduction Swiss German is not an official language of Switzerland, rather it includes dialects of Standard German, which is one of the four official languages. However, it is different from Standard German in terms of phonetics, lexicon, morphology and syntax. Swiss German is not dividable into a few dialects, in fact it is a dialect continuum with a huge variety. Swiss German is not only a spoken dialect but increasingly used in written form, especially in less formal text types. Often, Swiss German speakers write text messages, emails and blogs in Swiss German. However, in recent years it has become more and more popular and authors are publishing in their own dialect. Nonetheless, there is neither a writing standard nor an official orthography, which increases the variations dramatically due to the fact that people write as they please with their own style. -
Romanian Language and Its Dialects
Social Sciences ROMANIAN LANGUAGE AND ITS DIALECTS Ana-Maria DUDĂU1 ABSTRACT: THE ROMANIAN LANGUAGE, THE CONTINUANCE OF THE LATIN LANGUAGE SPOKEN IN THE EASTERN PARTS OF THE FORMER ROMAN EMPIRE, COMES WITH ITS FOUR DIALECTS: DACO- ROMANIAN, AROMANIAN, MEGLENO-ROMANIAN AND ISTRO-ROMANIAN TO COMPLETE THE EUROPEAN LINGUISTIC PALETTE. THE ROMANIAN LINGUISTS HAVE ALWAYS SHOWN A PERMANENT CONCERN FOR BOTH THE IDENTITY AND THE STATUS OF THE ROMANIAN LANGUAGE AND ITS DIALECTS, THUS SUPPORTING THE EXISTENCE OF THE ETHNIC, LINGUISTIC AND CULTURAL PARTICULARITIES OF THE MINORITIES AND REJECTING, FIRMLY, ANY ATTEMPT TO ASSIMILATE THEM BY FORCE KEYWORDS: MULTILINGUALISM, DIALECT, ASSIMILATION, OFFICIAL LANGUAGE, SPOKEN LANGUAGE. The Romanian language - the only Romance language in Eastern Europe - is an "island" of Latinity in a mainly "Slavic sea" - including its dialects from the south of the Danube – Aromanian, Megleno-Romanian and Istro-Romanian. Multilingualism is defined narrowly as the alternative use of several languages; widely, it is use of several alternative language systems, regardless of their status: different languages, dialects of the same language or even varieties of the same idiom, being a natural consequence of linguistic contact. Multilingualism is an Europe value and a shared commitment, with particular importance for initial education, lifelong learning, employment, justice, freedom and security. Romanian language, with its four dialects - Daco-Romanian, Aromanian, Megleno- Romanian and Istro-Romanian – is the continuance of the Latin language spoken in the eastern parts of the former Roman Empire. Together with the Dalmatian language (now extinct) and central and southern Italian dialects, is part of the Apenino-Balkan group of Romance languages, different from theAlpine–Pyrenean group2. -
Linguapax Review 2010 Linguapax Review 2010
LINGUAPAX REVIEW 2010 MATERIALS / 6 / MATERIALS Col·lecció Materials, 6 Linguapax Review 2010 Linguapax Review 2010 Col·lecció Materials, 6 Primera edició: febrer de 2011 Editat per: Amb el suport de : Coordinació editorial: Josep Cru i Lachman Khubchandani Traduccions a l’anglès: Kari Friedenson i Victoria Pounce Revisió dels textos originals en anglès: Kari Friedenson Revisió dels textos originals en francès: Alain Hidoine Disseny i maquetació: Monflorit Eddicions i Assessoraments, sl. ISBN: 978-84-15057-12-3 Els continguts d’aquesta publicació estan subjectes a una llicència de Reconeixe- ment-No comercial-Compartir 2.5 de Creative Commons. Se’n permet còpia, dis- tribució i comunicació pública sense ús comercial, sempre que se’n citi l’autoria i la distribució de les possibles obres derivades es faci amb una llicència igual a la que regula l’obra original. La llicència completa es pot consultar a: «http://creativecom- mons.org/licenses/by-nc-sa/2.5/es/deed.ca» LINGUAPAX REVIEW 2010 Centre UNESCO de Catalunya Barcelona, 2011 4 CONTENTS PRESENTATION Miquel Àngel Essomba 6 FOREWORD Josep Cru 8 1. THE HISTORY OF LINGUAPAX 1.1 Materials for a history of Linguapax 11 Fèlix Martí 1.2 The beginnings of Linguapax 14 Miquel Siguan 1.3 Les débuts du projet Linguapax et sa mise en place 17 au siège de l’UNESCO Joseph Poth 1.4 FIPLV and Linguapax: A Quasi-autobiographical 23 Account Denis Cunningham 1.5 Defending linguistic and cultural diversity 36 1.5 La defensa de la diversitat lingüística i cultural Fèlix Martí 2. GLIMPSES INTO THE WORLD’S LANGUAGES TODAY 2.1 Living together in a multilingual world. -
Ebook Download a Reference Grammar of Modern Italian
A REFERENCE GRAMMAR OF MODERN ITALIAN PDF, EPUB, EBOOK Martin Maiden,Cecilia Robustelli | 512 pages | 01 Jun 2009 | Taylor & Francis Ltd | 9780340913390 | Italian | London, United Kingdom A Reference Grammar of Modern Italian PDF Book This Italian reference grammar provides a comprehensive, accessible and jargon-free guide to the forms and structures of Italian. This rule is not absolute, and some exceptions do exist. Parli inglese? Italian is an official language of Italy and San Marino and is spoken fluently by the majority of the countries' populations. The rediscovery of Dante's De vulgari eloquentia , as well as a renewed interest in linguistics in the 16th century, sparked a debate that raged throughout Italy concerning the criteria that should govern the establishment of a modern Italian literary and spoken language. Compared with most other Romance languages, Italian has many inconsistent outcomes, where the same underlying sound produces different results in different words, e. An instance of neuter gender also exists in pronouns of the third person singular. Italian immigrants to South America have also brought a presence of the language to that continent. This article contains IPA phonetic symbols. Retrieved 7 August Italian is widely taught in many schools around the world, but rarely as the first foreign language. In linguistic terms, the writing system is close to being a phonemic orthography. For a group composed of boys and girls, ragazzi is the plural, suggesting that -i is a general plural. Book is in Used-Good condition. Story of Language. A history of Western society. It formerly had official status in Albania , Malta , Monaco , Montenegro Kotor , Greece Ionian Islands and Dodecanese and is generally understood in Corsica due to its close relation with the Tuscan-influenced local language and Savoie. -
J. Collins Malay Dialect Research in Malysia: the Issue of Perspective
J. Collins Malay dialect research in Malysia: The issue of perspective In: Bijdragen tot de Taal-, Land- en Volkenkunde 145 (1989), no: 2/3, Leiden, 235-264 This PDF-file was downloaded from http://www.kitlv-journals.nl Downloaded from Brill.com09/28/2021 12:15:07AM via free access JAMES T. COLLINS MALAY DIALECT RESEARCH IN MALAYSIA: THE ISSUE OF PERSPECTIVE1 Introduction When European travellers and adventurers began to explore the coasts and islands of Southeast Asia almost five hundred years ago, they found Malay spoken in many of the ports and entrepots of the region. Indeed, today Malay remains an important indigenous language in Malaysia, Indonesia, Brunei, Thailand and Singapore.2 It should not be a surprise, then, that such a widespread and ancient language is characterized by a wealth of diverse 1 Earlier versions of this paper were presented to the English Department of the National University of Singapore (July 22,1987) and to the Persatuan Linguistik Malaysia (July 23, 1987). I would like to thank those who attended those presentations and provided valuable insights that have contributed to improving the paper. I am especially grateful to Dr. Anne Pakir of Singapore and to Dr. Nik Safiah Karim of Malaysia, who invited me to present a paper. I am also grateful to Dr. Azhar M. Simin and En. Awang Sariyan, who considerably enlivened the presentation in Kuala Lumpur. Professor George Grace and Professor Albert Schiitz read earlier drafts of this paper. I thank them for their advice and encouragement. 2 Writing in 1881, Maxwell (1907:2) observed that: 'Malay is the language not of a nation, but of tribes and communities widely scattered in the East.. -
Some Principles of the Use of Macro-Areas Language Dynamics &A
Online Appendix for Harald Hammarstr¨om& Mark Donohue (2014) Some Principles of the Use of Macro-Areas Language Dynamics & Change Harald Hammarstr¨om& Mark Donohue The following document lists the languages of the world and their as- signment to the macro-areas described in the main body of the paper as well as the WALS macro-area for languages featured in the WALS 2005 edi- tion. 7160 languages are included, which represent all languages for which we had coordinates available1. Every language is given with its ISO-639-3 code (if it has one) for proper identification. The mapping between WALS languages and ISO-codes was done by using the mapping downloadable from the 2011 online WALS edition2 (because a number of errors in the mapping were corrected for the 2011 edition). 38 WALS languages are not given an ISO-code in the 2011 mapping, 36 of these have been assigned their appropri- ate iso-code based on the sources the WALS lists for the respective language. This was not possible for Tasmanian (WALS-code: tsm) because the WALS mixes data from very different Tasmanian languages and for Kualan (WALS- code: kua) because no source is given. 17 WALS-languages were assigned ISO-codes which have subsequently been retired { these have been assigned their appropriate updated ISO-code. In many cases, a WALS-language is mapped to several ISO-codes. As this has no bearing for the assignment to macro-areas, multiple mappings have been retained. 1There are another couple of hundred languages which are attested but for which our database currently lacks coordinates. -
The Status of the Least Documented Language Families in the World
Vol. 4 (2010), pp. 177-212 http://nflrc.hawaii.edu/ldc/ http://hdl.handle.net/10125/4478 The status of the least documented language families in the world Harald Hammarström Radboud Universiteit, Nijmegen and Max Planck Institute for Evolutionary Anthropology, Leipzig This paper aims to list all known language families that are not yet extinct and all of whose member languages are very poorly documented, i.e., less than a sketch grammar’s worth of data has been collected. It explains what constitutes a valid family, what amount and kinds of documentary data are sufficient, when a language is considered extinct, and more. It is hoped that the survey will be useful in setting priorities for documenta- tion fieldwork, in particular for those documentation efforts whose underlying goal is to understand linguistic diversity. 1. InTroducTIon. There are several legitimate reasons for pursuing language documen- tation (cf. Krauss 2007 for a fuller discussion).1 Perhaps the most important reason is for the benefit of the speaker community itself (see Voort 2007 for some clear examples). Another reason is that it contributes to linguistic theory: if we understand the limits and distribution of diversity of the world’s languages, we can formulate and provide evidence for statements about the nature of language (Brenzinger 2007; Hyman 2003; Evans 2009; Harrison 2007). From the latter perspective, it is especially interesting to document lan- guages that are the most divergent from ones that are well-documented—in other words, those that belong to unrelated families. I have conducted a survey of the documentation of the language families of the world, and in this paper, I will list the least-documented ones. -
Why Speak Quechua? : a Study of Language Attitudes Among Native Quechua Speakers In
Why Speak Quechua? : A Study of Language Attitudes among Native Quechua Speakers in Lima, Peru. A Senior Honors Thesis Presented in Partial Fulfillment of the Requirements for graduation with research distinction in Linguistics in the undergraduate colleges of The Ohio State University by Nicole Holliday The Ohio State University June 2010 Project Advisor: Professor Donald Winford, Department of Linguistics Holliday 2 I. Introduction According to the U.S. State Department, Bureau of Western Hemisphere Affairs, there are presently 3.2 million Quechua speakers in Peru, which constitute approximately 16.5% of the total Peruvian population. As a result of the existence of a numerically prominent Quechua speaking population, the language is not presently classified as endangered in Peru. The 32 documented dialects of Quechua are considered as part of both an official language of Peru and a “lingua franca” in most regions of the Andes (Sherzer & Urban 1988, Lewis 2009). While the Peruvian government is supportive of the Quechua macrolanguage, “The State promotes the study and the knowledge of indigenous languages” (Article 83 of the Constitutional Assembly of Peru qtd. inVon Gleich 1994), many believe that with the advent of new technology and heavy cultural pressure to learn Spanish, Quechua will begin to fade into obscurity, just as the languages of Aymara and Kura have “lost their potency” in many parts of South America (Amastae 1989). At this point in time, there exists a great deal of data about how Quechua is used in Peru, but there is little data about language attitudes there, and even less about how native Quechua speakers view both their own language and how it relates to the more widely- spoken Spanish. -
The Malayic-Speaking Orang Laut Dialects and Directions for Research
KARLWacana ANDERBECK Vol. 14 No., The 2 Malayic-speaking(October 2012): 265–312Orang Laut 265 The Malayic-speaking Orang Laut Dialects and directions for research KARL ANDERBECK Abstract Southeast Asia is home to many distinct groups of sea nomads, some of which are known collectively as Orang (Suku) Laut. Those located between Sumatra and the Malay Peninsula are all Malayic-speaking. Information about their speech is paltry and scattered; while starting points are provided in publications such as Skeat and Blagden (1906), Kähler (1946a, b, 1960), Sopher (1977: 178–180), Kadir et al. (1986), Stokhof (1987), and Collins (1988, 1995), a comprehensive account and description of Malayic Sea Tribe lects has not been provided to date. This study brings together disparate sources, including a bit of original research, to sketch a unified linguistic picture and point the way for further investigation. While much is still unknown, this paper demonstrates relationships within and between individual Sea Tribe varieties and neighbouring canonical Malay lects. It is proposed that Sea Tribe lects can be assigned to four groupings: Kedah, Riau Islands, Duano, and Sekak. Keywords Malay, Malayic, Orang Laut, Suku Laut, Sea Tribes, sea nomads, dialectology, historical linguistics, language vitality, endangerment, Skeat and Blagden, Holle. 1 Introduction Sometime in the tenth century AD, a pair of ships follows the monsoons to the southeast coast of Sumatra. Their desire: to trade for its famed aromatic resins and gold. Threading their way through the numerous straits, the ships’ path is a dangerous one, filled with rocky shoals and lurking raiders. Only one vessel reaches its destination. -
Chinese Organized Crime in Latin America
Department of Justice Weapons and money seized by U.S. Drug Enforcement Administration Chinese Organized Crime in Latin America BY R. EVAN ELLis n June 2010, the sacking of Secretary of Justice Romeu Tuma Júnior for allegedly being an agent of the Chinese mafia rocked Brazilian politics.1 Three years earlier, in July 2007, the Ihead of the Colombian national police, General Oscar Naranjo, made the striking procla- mation that “the arrival of the Chinese and Russian mafias in Mexico and all of the countries in the Americas is more than just speculation.”2 Although, to date, the expansion of criminal ties between the People’s Republic of China (PRC) and Latin America has lagged behind the exponen- tial growth of trade and investment between the two regions, the incidents mentioned above high- light that criminal ties between the regions are becoming an increasingly problematic by-product of expanding China–Latin America interactions, with troubling implications for both regions. Although data to quantify the character and extent of such ties are lacking, public evidence suggests that criminal activity spanning the two regions is principally concentrated in four cur- rent domains and two potentially emerging areas. The four groupings of current criminal activ- ity between China and Latin America are extortion of Chinese communities in Latin America by groups with ties to China, trafficking in persons from China through Latin America into the United States or Canada, trafficking in narcotics and precursor chemicals, and trafficking in con- traband goods. The two emerging areas are arms trafficking and money laundering. It is important to note that this analysis neither implicates the Chinese government in such ties nor absolves it, although a consideration of incentives suggests that it is highly unlikely that the government would be involved in any systematic fashion. -
1 Chapter 1 Introduction As a Chinese Buddhist in Malaysia, I Have Been
Chapter 1 Introduction As a Chinese Buddhist in Malaysia, I have been unconsciously entangled in a historical process of the making of modern Buddhism. There was a Chinese temple beside my house in Penang, Malaysia. The main deity was likely a deified imperial court officer, though no historical record documented his origin. A mosque serenely resided along the main street approximately 50 meters from my house. At the end of the street was a Hindu temple decorated with colorful statues. Less than five minutes’ walk from my house was a Buddhist association in a two-storey terrace. During my childhood, the Chinese temple was a playground. My friends and I respected the deities worshipped there but sometimes innocently stole sweets and fruits donated by worshippers as offerings. Each year, three major religious events were organized by the temple committee: the end of the first lunar month marked the spring celebration of a deity in the temple; the seventh lunar month was the Hungry Ghost Festival; and the eighth month honored, She Fu Da Ren, the temple deity’s birthday. The temple was busy throughout the year. Neighbors gathered there to chat about national politics and local gossip. The traditional Chinese temple was thus deeply rooted in the community. In terms of religious intimacy with different nearby temples, the Chinese temple ranked first, followed by the Hindu temple and finally, the mosque, which had a psychological distant demarcated by racial boundaries. I accompanied my mother several times to the Hindu temple. Once, I asked her why she prayed to a Hindu deity. -
Semantic Approaches in Natural Language Processing
Cover:Layout 1 02.08.2010 13:38 Seite 1 10th Conference on Natural Language Processing (KONVENS) g n i s Semantic Approaches s e c o in Natural Language Processing r P e g a u Proceedings of the Conference g n a on Natural Language Processing 2010 L l a r u t a This book contains state-of-the-art contributions to the 10th N n Edited by conference on Natural Language Processing, KONVENS 2010 i (Konferenz zur Verarbeitung natürlicher Sprache), with a focus s e Manfred Pinkal on semantic processing. h c a o Ines Rehbein The KONVENS in general aims at offering a broad perspective r on current research and developments within the interdiscipli - p p nary field of natural language processing. The central theme A Sabine Schulte im Walde c draws specific attention towards addressing linguistic aspects i t of meaning, covering deep as well as shallow approaches to se - n Angelika Storrer mantic processing. The contributions address both knowledge- a m based and data-driven methods for modelling and acquiring e semantic information, and discuss the role of semantic infor - S mation in applications of language technology. The articles demonstrate the importance of semantic proces - sing, and present novel and creative approaches to natural language processing in general. Some contributions put their focus on developing and improving NLP systems for tasks like Named Entity Recognition or Word Sense Disambiguation, or focus on semantic knowledge acquisition and exploitation with respect to collaboratively built ressources, or harvesting se - mantic information in virtual games. Others are set within the context of real-world applications, such as Authoring Aids, Text Summarisation and Information Retrieval.