Cultural and Linguistic Guidelines for Language Evaluation of Arab-American
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Automatic Identification of Arabic Language Varieties and Dialects in Social Media
Automatic Identification of Arabic Language Varieties and Dialects in Social Media Fatiha Sadat Farnazeh Kazemi Atefeh Farzindar University of Quebec in NLP Technologies Inc. NLP Technologies Inc. Montreal, 201 President Ken- 52 Le Royer Street W., 52 Le Royer Street W., nedy, Montreal, QC, Canada Montreal, QC, Canada Montreal, QC, Canada [email protected] [email protected] [email protected] Abstract Modern Standard Arabic (MSA) is the formal language in most Arabic countries. Arabic Dia- lects (AD) or daily language differs from MSA especially in social media communication. However, most Arabic social media texts have mixed forms and many variations especially be- tween MSA and AD. This paper aims to bridge the gap between MSA and AD by providing a framework for AD classification using probabilistic models across social media datasets. We present a set of experiments using the character n-gram Markov language model and Naive Bayes classifiers with detailed examination of what models perform best under different condi- tions in social media context. Experimental results show that Naive Bayes classifier based on character bi-gram model can identify the 18 different Arabic dialects with a considerable over- all accuracy of 98%. 1 Introduction Arabic is a morphologically rich and complex language, which presents significant challenges for nat- ural language processing and its applications. It is the official language in 22 countries spoken by more than 350 million people around the world1. Moreover, the Arabic language exists in a state of diglossia where the standard form of the language, Modern Standard Arabic (MSA) and the regional dialects (AD) live side-by-side and are closely related (Elfardy and Diab, 2013). -
Christians and Jews in Muslim Societies
Arabic and its Alternatives Christians and Jews in Muslim Societies Editorial Board Phillip Ackerman-Lieberman (Vanderbilt University, Nashville, USA) Bernard Heyberger (EHESS, Paris, France) VOLUME 5 The titles published in this series are listed at brill.com/cjms Arabic and its Alternatives Religious Minorities and Their Languages in the Emerging Nation States of the Middle East (1920–1950) Edited by Heleen Murre-van den Berg Karène Sanchez Summerer Tijmen C. Baarda LEIDEN | BOSTON Cover illustration: Assyrian School of Mosul, 1920s–1930s; courtesy Dr. Robin Beth Shamuel, Iraq. This is an open access title distributed under the terms of the CC BY-NC 4.0 license, which permits any non-commercial use, distribution, and reproduction in any medium, provided no alterations are made and the original author(s) and source are credited. Further information and the complete license text can be found at https://creativecommons.org/licenses/by-nc/4.0/ The terms of the CC license apply only to the original material. The use of material from other sources (indicated by a reference) such as diagrams, illustrations, photos and text samples may require further permission from the respective copyright holder. Library of Congress Cataloging-in-Publication Data Names: Murre-van den Berg, H. L. (Hendrika Lena), 1964– illustrator. | Sanchez-Summerer, Karene, editor. | Baarda, Tijmen C., editor. Title: Arabic and its alternatives : religious minorities and their languages in the emerging nation states of the Middle East (1920–1950) / edited by Heleen Murre-van den Berg, Karène Sanchez, Tijmen C. Baarda. Description: Leiden ; Boston : Brill, 2020. | Series: Christians and Jews in Muslim societies, 2212–5523 ; vol. -
Saudi Dialects: Are They Endangered?
Academic Research Publishing Group English Literature and Language Review ISSN(e): 2412-1703, ISSN(p): 2413-8827 Vol. 2, No. 12, pp: 131-141, 2016 URL: http://arpgweb.com/?ic=journal&journal=9&info=aims Saudi Dialects: Are They Endangered? Salih Alzahrani Taif University, Saudi Arabia Abstract: Krauss, among others, claims that languages will face death in the coming centuries (Krauss, 1992). Austin (2010a) lists 7,000 languages as existing and spoken in the world today. Krauss estimates that this figure could come down to 600. That is, most the world's languages are endangered. Therefore, an endangered language is a language that loses her speakers within a few generations. According to Dorian (1981), there is what is called ―tip‖ in language endangerment. He argues that a language's decline can start slowly but suddenly goes through a rapid decline towards the extinction. Thus, languages must be protected at much earlier stage. Arabic dialects such as Zahrani Spoken Arabic (ZSA), and Faifi Spoken Arabic (henceforth, FSA), which are spoken in the southern region of Saudi Arabia, have not been studied, yet. Few people speak these dialects, among many other dialects in the same region. However, the problem is that most these dialects' native speakers are moving to other regions in Saudi Arabia where they use other different dialects. Therefore, are these dialects endangered? What other factors may cause its endangerment? Have they been documented before? What shall we do? This paper discusses three main different points regarding this issue: language and endangerment, languages documentation and description and Arabic language and its family, giving a brief history of Saudi dialects comparing their situation with the whole existing dialects. -
The Status of the Least Documented Language Families in the World
Vol. 4 (2010), pp. 177-212 http://nflrc.hawaii.edu/ldc/ http://hdl.handle.net/10125/4478 The status of the least documented language families in the world Harald Hammarström Radboud Universiteit, Nijmegen and Max Planck Institute for Evolutionary Anthropology, Leipzig This paper aims to list all known language families that are not yet extinct and all of whose member languages are very poorly documented, i.e., less than a sketch grammar’s worth of data has been collected. It explains what constitutes a valid family, what amount and kinds of documentary data are sufficient, when a language is considered extinct, and more. It is hoped that the survey will be useful in setting priorities for documenta- tion fieldwork, in particular for those documentation efforts whose underlying goal is to understand linguistic diversity. 1. InTroducTIon. There are several legitimate reasons for pursuing language documen- tation (cf. Krauss 2007 for a fuller discussion).1 Perhaps the most important reason is for the benefit of the speaker community itself (see Voort 2007 for some clear examples). Another reason is that it contributes to linguistic theory: if we understand the limits and distribution of diversity of the world’s languages, we can formulate and provide evidence for statements about the nature of language (Brenzinger 2007; Hyman 2003; Evans 2009; Harrison 2007). From the latter perspective, it is especially interesting to document lan- guages that are the most divergent from ones that are well-documented—in other words, those that belong to unrelated families. I have conducted a survey of the documentation of the language families of the world, and in this paper, I will list the least-documented ones. -
Arabic and Contact-Induced Change Christopher Lucas, Stefano Manfredi
Arabic and Contact-Induced Change Christopher Lucas, Stefano Manfredi To cite this version: Christopher Lucas, Stefano Manfredi. Arabic and Contact-Induced Change. 2020. halshs-03094950 HAL Id: halshs-03094950 https://halshs.archives-ouvertes.fr/halshs-03094950 Submitted on 15 Jan 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Arabic and contact-induced change Edited by Christopher Lucas Stefano Manfredi language Contact and Multilingualism 1 science press Contact and Multilingualism Editors: Isabelle Léglise (CNRS SeDyL), Stefano Manfredi (CNRS SeDyL) In this series: 1. Lucas, Christopher & Stefano Manfredi (eds.). Arabic and contact-induced change. Arabic and contact-induced change Edited by Christopher Lucas Stefano Manfredi language science press Lucas, Christopher & Stefano Manfredi (eds.). 2020. Arabic and contact-induced change (Contact and Multilingualism 1). Berlin: Language Science Press. This title can be downloaded at: http://langsci-press.org/catalog/book/235 © 2020, the authors Published under the Creative Commons Attribution -
Why Speak Quechua? : a Study of Language Attitudes Among Native Quechua Speakers In
Why Speak Quechua? : A Study of Language Attitudes among Native Quechua Speakers in Lima, Peru. A Senior Honors Thesis Presented in Partial Fulfillment of the Requirements for graduation with research distinction in Linguistics in the undergraduate colleges of The Ohio State University by Nicole Holliday The Ohio State University June 2010 Project Advisor: Professor Donald Winford, Department of Linguistics Holliday 2 I. Introduction According to the U.S. State Department, Bureau of Western Hemisphere Affairs, there are presently 3.2 million Quechua speakers in Peru, which constitute approximately 16.5% of the total Peruvian population. As a result of the existence of a numerically prominent Quechua speaking population, the language is not presently classified as endangered in Peru. The 32 documented dialects of Quechua are considered as part of both an official language of Peru and a “lingua franca” in most regions of the Andes (Sherzer & Urban 1988, Lewis 2009). While the Peruvian government is supportive of the Quechua macrolanguage, “The State promotes the study and the knowledge of indigenous languages” (Article 83 of the Constitutional Assembly of Peru qtd. inVon Gleich 1994), many believe that with the advent of new technology and heavy cultural pressure to learn Spanish, Quechua will begin to fade into obscurity, just as the languages of Aymara and Kura have “lost their potency” in many parts of South America (Amastae 1989). At this point in time, there exists a great deal of data about how Quechua is used in Peru, but there is little data about language attitudes there, and even less about how native Quechua speakers view both their own language and how it relates to the more widely- spoken Spanish. -
Phonetics and Phonology Paradox in Levantine Arabic: an Analytical Evaluation of Arabic Geminates’ Hypocrisy
ISSN 1799-2591 Theory and Practice in Language Studies, Vol. 9, No. 7, pp. 854-864, July 2019 DOI: http://dx.doi.org/10.17507/tpls.0907.16 Phonetics and Phonology Paradox in Levantine Arabic: An Analytical Evaluation of Arabic Geminates’ Hypocrisy Ahmad Mahmoud Saidat Department of English Language and Literature, Al-Hussein Bin Talal University, Jordan Jamal A. Khlifat Department of Linguistics, University of Colorado, Boulder, USA Abstract—This paper explores the phonetic and phonological paradox between two categories of Levantine- Arabic long consonants—known as geminates by looking closely at the hypocrite Arabic geminates. Hypocrite geminates are phonetically long segments in a sequence that are not contrastive. The paper seeks to demonstrate that Arabic geminates can be classified into two categories—true vs. fake geminates—based on the phonological process of inseparability and the Obligatory Contour Principle (OCP). Thirty Levantine Arabic speakers have taken part in this case study. Fifteen participants were asked to utter a group of stimuli where the two types of geminates interact with the surrounding phonological environment. The other fifteen participants were recorded while reading target word lists that contained geminate consonants and medial singleton preceded by short and long consonants and engaging in naturalistic conversations. Auditory and acoustic analyses of long consonants were made. Results from the word lists indicated that while Arabic true geminates embrace the phonological process of inseparability, Arabic fake geminates do not. The case study also shows that the OCP seems to bridge the contradiction between these two categories of Arabic geminates. Index Terms—Arabic geminates, epenthesis, inseparability, obligatory contour principle, CV phonology I. -
Integrating Dialects Into the Modern Standard Arabic
INTEGRATING DIALECTS INTO THE MODERN STANDARD ARABIC HIGH SCHOOL CLASSROOM BY BRIDGET J. HIRSCH B.A., The George Washington University, 2003 THESIS Presented to the Faculty of Concordia College, Moorhead, Minnesota in partial fulfillment of the requirements for the degree of MASTER OF EDUCATION IN WORLD LANGUAGE INSTRUCTION CONCORDIA COLLEGE MAY 2009 This thesis submitted by Bridget J. Hirsch in partial fulfillment of the requirements for the Degree of Master of Education: World Language Instruction from Concordia College has been read by the Examining Committee under whom the work has been done and is hereby approved. As the Committee Chairperson, I hereby certify that this thesis is complete and satisfactory in all respects, and that any and all revisions required by the final examining committee have been made. This thesis meets the standards for appearance, conforms to the style and format requirements of the Office of Graduate Programs and Continuing Studies of Concordia College, and is hereby approved. COPYRIGHT PAGE It is the policy of Concordia College to allow students to retain ownership of the copyright to the thesis after deposit. However, as a condition of accepting the degree, the student grants the College the non-exclusive right to retain, use and distribute a limited number of copies of the thesis, together with the right to require its publication for archival use. Signature Date ACKNOWLEDGEMENTS First and foremost, thank you to all high school Arabic teachers that filled out my survey. You have an incredible task at hand, and your time and feedback is greatly appreciated. Thank you to those that assisted in the development and distribution of the survey, specifically Donna Clementi, Karin Ryding, Salah Ayari, the American Association of Teachers of Arabic, the National Capitol Language Research Center, Brigham Young’s Arabic Listserv, and Dora Johnson at the Center for Advanced Linguistics. -
Language Classification in the Ethnologue and Its Consequences (Paper)
Sarah E. Cornwell The University of Western Ontario, London, Ontario, Canada Language Classification in The Ethnologue and its Consequences (Paper) Abstract: The Ethnologue is a widely used classificatory standard for the world’s 7000+ natural languages. However, the motives and processes used by The Ethnologue’s governing body, SIL International, have come under criticism by linguists. This paper investigates how The Ethnologue answers the question “What is a language?” through the theoretical lens presented by Bowker and Star in Sorting Things Out (1999) and presents some consequences of those classificatory decisions. 1. Introduction Language is central to human culture and experience. It is the central core of our communicative capacity, running through all media of communication. Due to this essential role, there is a need to classify and count languages and their speakers1. Governments need to communicate with citizens, libraries need to catalogue books by language, and search engines need to return webpages in the appropriate language for each searcher. For these reasons among others, there is a strong institutional need for a standardized classification structure and labelling system for languages. The most widespread current classificatory infrastructure for languages is based on The Ethnologue. This essay will critically evaluate The Ethnologue using the Foucauldian theory developed for investigating classification structures by Bowker and Star (1999) in Sorting Things Out: Classification and its Consequences. 2. The Ethnologue The Ethnologue is a catalogue of all the world’s languages. First compiled in 1951, The Ethnologue is currently in its 20th edition and contains descriptions of 7099 living languages (2017). Since 1997, The Ethnologue has been available freely online and widely accessible (Simons & Fennig, 2017). -
Writing Arabizi: Orthographic Variation in Romanized
WRITING ARABIZI: ORTHOGRAPHIC VARIATION IN ROMANIZED LEBANESE ARABIC ON TWITTER ! ! ! ! Natalie!Sullivan! ! ! ! TC!660H!! Plan!II!Honors!Program! The!University!of!Texas!at!Austin! ! ! ! ! May!4,!2017! ! ! ! ! ! ! ! _______________________________________________________! Barbara!Bullock,!Ph.D.! Department!of!French!&!Italian! Supervising!Professor! ! ! ! ! _______________________________________________________! John!Huehnergard,!Ph.D.! Department!of!Middle!Eastern!Studies! Second!Reader!! ii ABSTRACT Author: Natalie Sullivan Title: Writing Arabizi: Orthographic Variation in Romanized Lebanese Arabic on Twitter Supervising Professors: Dr. Barbara Bullock, Dr. John Huehnergard How does technology influence the script in which a language is written? Over the past few decades, a new form of writing has emerged across the Arab world. Known as Arabizi, it is a type of Romanized Arabic that uses Latin characters instead of Arabic script. It is mainly used by youth in technology-related contexts such as social media and texting, and has made many older Arabic speakers fear that more standard forms of Arabic may be in danger because of its use. Prior work on Arabizi suggests that although it is used frequently on social media, its orthography is not yet standardized (Palfreyman and Khalil, 2003; Abdel-Ghaffar et al., 2011). Therefore, this thesis aimed to examine orthographic variation in Romanized Lebanese Arabic, which has rarely been studied as a Romanized dialect. It was interested in how often Arabizi is used on Twitter in Lebanon and the extent of its orthographic variation. Using Twitter data collected from Beirut, tweets were analyzed to discover the most common orthographic variants in Arabizi for each Arabic letter, as well as the overall rate of Arabizi use. Results show that Arabizi was not used as frequently as hypothesized on Twitter, probably because of its low prestige and increased globalization. -
Ethnologue: Languages of Honduras Twentieth Edition Data
Ethnologue: Languages of Honduras Twentieth edition data Gary F. Simons and Charles D. Fennig, Editors Based on information from the Ethnologue, 20th edition: Simons, Gary F. and Charles D. Fennig (eds.). 2017. Ethnologue: Languages of the World, Twentieth edition. Dallas, Texas: SIL International. Online: http://www.ethnologue.com. For personal use only Permission to distribute or reuse this work (in whole or in part) may be obtained through the Copyright Clearance Center at http://www.copyright.com. SIL International, 7500 West Camp Wisdom Road, Dallas, Texas 75236-5699 USA Web: www.sil.org, Phone: +1 972 708 7404, Email: [email protected] Ethnologue: Languages of Honduras 2 Contents List of Abbreviations 3 How to Use This Digest 4 Country Overview 6 Language Status Profile 7 Statistical Summaries 8 Alphabetical Listing of Languages 11 Language Map 14 Languages by Population 15 Languages by Status 16 Languages by Department 18 Languages by Family 19 Language Code Index 20 Language Name Index 21 Bibliography 22 Copyright © 2017 by SIL International All rights reserved. No part of this publication may be reproduced, redistributed, or transmitted in any form or by any means—electronic, mechanical, photocopying, recording, or otherwise—without the prior written permission of SIL International, with the exception of brief excerpts in articles or reviews. Ethnologue: Languages of Honduras 3 List of Abbreviations A Agent in constituent word order alt. alternate name for alt. dial. alternate dialect name for C Consonant in canonical syllable patterns CDE Convention against Discrimination in Education (1960) Class Language classification CPPDCE Convention on the Protection and Promotion of the Diversity of Cultural Expressions (2005) CSICH Convention for the Safeguarding of Intangible Cultural Heritage (2003) dial. -
00. the Realization of Negation in the Syrian Arabic Clause, Phrase, And
The realization of negation in the Syrian Arabic clause, phrase, and word Isa Wayne Murphy M.Phil in Applied Linguistics Trinity College Dublin 2014 Supervisor: Dr. Brian Nolan Declaration I declare that this dissertation has not been submitted as an exercise for a degree at this or any other university and that it is entirely my own work. I agree that the Library may lend or copy this dissertation on request. Signed: Date: 2 Abstract The realization of negation in the Syrian Arabic clause, phrase, and word Isa Wayne Murphy Syrian Arabic realizes negation in broadly the same way as other dialects of Arabic, but it does so utilizing varied and at times unique means. This dissertation provides a Role and Reference Grammar account of the full spectrum of lexical, morphological, and analytical means employed by Syrian Arabic to encode negation on the layered structures of the verb, the clause, the noun, and the noun phrase. The scope negation takes within the LSC and the LSNP is identified and illustrated. The study found that Syrian Arabic employs separate negative particles to encode wide-scope negation on clauses and narrow-scope negation on constituents, and utilizes varied and interesting means to express emphatic negation. It also found that while Syrian Arabic belongs in most respects to the broader Levantine family of Arabic dialects, its negation strategy is more closely aligned with the Arabic dialects of Iraq and the Arab Gulf states. 3 Table of Contents DECLARATION.........................................................................................................................