Classical and Modern Standard Arabic Marijn Van Putten University of Leiden
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Automatic Identification of Arabic Language Varieties and Dialects in Social Media
Automatic Identification of Arabic Language Varieties and Dialects in Social Media Fatiha Sadat Farnazeh Kazemi Atefeh Farzindar University of Quebec in NLP Technologies Inc. NLP Technologies Inc. Montreal, 201 President Ken- 52 Le Royer Street W., 52 Le Royer Street W., nedy, Montreal, QC, Canada Montreal, QC, Canada Montreal, QC, Canada [email protected] [email protected] [email protected] Abstract Modern Standard Arabic (MSA) is the formal language in most Arabic countries. Arabic Dia- lects (AD) or daily language differs from MSA especially in social media communication. However, most Arabic social media texts have mixed forms and many variations especially be- tween MSA and AD. This paper aims to bridge the gap between MSA and AD by providing a framework for AD classification using probabilistic models across social media datasets. We present a set of experiments using the character n-gram Markov language model and Naive Bayes classifiers with detailed examination of what models perform best under different condi- tions in social media context. Experimental results show that Naive Bayes classifier based on character bi-gram model can identify the 18 different Arabic dialects with a considerable over- all accuracy of 98%. 1 Introduction Arabic is a morphologically rich and complex language, which presents significant challenges for nat- ural language processing and its applications. It is the official language in 22 countries spoken by more than 350 million people around the world1. Moreover, the Arabic language exists in a state of diglossia where the standard form of the language, Modern Standard Arabic (MSA) and the regional dialects (AD) live side-by-side and are closely related (Elfardy and Diab, 2013). -
(Northwest) Semitic Sg. *CVCC-, Pl. *Cvcac-Ū-: Broken Plural Or Regular Reflex?*
Bulletin of SOAS, 84, 1 (2021), 1–17. © The Author(s), 2021. Published by Cambridge University Press. This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unre- stricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited. doi:10.1017/S0041977X2100001X (Northwest) Semitic sg. *CVCC-, pl. *CVCaC-ū-: Broken plural or regular reflex?* Benjamin D. Suchard KU Leuven, Belgium and Leiden University, The Netherlands [email protected] Jorik (F.J.) Groen Leiden University, The Netherlands [email protected] Abstract This paper provides a new explanation for the insertion of *a in plural forms of *CVCC- nouns also formed with an external plural suffix, e.g. *ʕabd- :*ʕabad-ū- ‘servant(s)’, in various Semitic languages. This *CVCaC-ū- pattern is usually considered to be a remnant of the Proto- Semitic broken plural system in Northwest Semitic, but we show that it goes back to Proto-Semitic in this form. Internal evidence from Semitic as well as comparative evidence from Afroasiatic points towards a pre- Proto-Semitic plural suffix *-w- underlying the external plural suffixes. This suffix created a consonant cluster in the plural of *CVCC- nouns, trig- gering epenthesis of *a. As the prime example of broken plural formation in Northwest Semitic thus seems to be purely suffixal in origin, we con- clude by briefly considering the implications for the history of nominal pluralization in Semitic. Keywords: Semitic, Historical linguistics, Northwest Semitic, Broken plural, Subclassification 1. -
Language of the Old Testament: Biblical Hebrew “The Holy Tongue”
E-ISSN 2281-4612 Academic Journal of Interdisciplinary Studies Vol 4 No 1 ISSN 2281-3993 MCSER Publishing, Rome-Italy March 2015 Language of the Old Testament: Biblical Hebrew “The Holy Tongue” Associate Professor Luke Emeka Ugwueye Department of Religion & Human Relations, Faculty of Arts, Nnamdi Azikiwe University, PMB 5025, Awka- Anambra State, Nigeria Email: [email protected] phone - 08067674763 Doi:10.5901/ajis.2015.v4n1p129 Abstract Some kind of familiarity with the structure and thought pattern of biblical Hebrew language enhances translation and improved ways of working with the language needed by students of Old Testament. That what the authors of the Scripture say also has meaning for us today is not in doubt but they did not express themselves primarily for us or in our language, and so it requires training on our part to understand them in their own language. The features of biblical Hebrew as combined in the language’s use of imagery and picturesque description of things are of huge assistance in this training exercise for a better operational knowledge of the language and meaning of Hebrew Scripture. Keywords: Language, Old Testament, Biblical Hebrew, Holy Tongue 1. Introduction Hebrew language is the language of the culture, religion and civilization of the Jewish people since ancient times. It belongs to the northwest ancient Semitic family of languages. The word Semitic, according to Kitchen (1992) is formed from the name Shem, Noah’s eldest son (Genesis 5:32). It is an adjective derived from ‘Shem’ meaning a member of any of the group of people speaking Akkadian, Phoenician, Punic, Aramaic, and especially Hebrew, Modern Hebrew and Arabic language. -
Saudi Dialects: Are They Endangered?
Academic Research Publishing Group English Literature and Language Review ISSN(e): 2412-1703, ISSN(p): 2413-8827 Vol. 2, No. 12, pp: 131-141, 2016 URL: http://arpgweb.com/?ic=journal&journal=9&info=aims Saudi Dialects: Are They Endangered? Salih Alzahrani Taif University, Saudi Arabia Abstract: Krauss, among others, claims that languages will face death in the coming centuries (Krauss, 1992). Austin (2010a) lists 7,000 languages as existing and spoken in the world today. Krauss estimates that this figure could come down to 600. That is, most the world's languages are endangered. Therefore, an endangered language is a language that loses her speakers within a few generations. According to Dorian (1981), there is what is called ―tip‖ in language endangerment. He argues that a language's decline can start slowly but suddenly goes through a rapid decline towards the extinction. Thus, languages must be protected at much earlier stage. Arabic dialects such as Zahrani Spoken Arabic (ZSA), and Faifi Spoken Arabic (henceforth, FSA), which are spoken in the southern region of Saudi Arabia, have not been studied, yet. Few people speak these dialects, among many other dialects in the same region. However, the problem is that most these dialects' native speakers are moving to other regions in Saudi Arabia where they use other different dialects. Therefore, are these dialects endangered? What other factors may cause its endangerment? Have they been documented before? What shall we do? This paper discusses three main different points regarding this issue: language and endangerment, languages documentation and description and Arabic language and its family, giving a brief history of Saudi dialects comparing their situation with the whole existing dialects. -
A Comprehensive NLP System for Modern Standard Arabic and Modern Hebrew
A Comprehensive NLP System for Modern Standard Arabic and Modern Hebrew Morphological analysis, lemmatization, vocalization, disambiguation and text-to-speech Dror Kamir Naama Soreq Yoni Neeman Melingo Ltd. Melingo Ltd. Melingo Ltd. 16 Totseret Haaretz st. 16 Totseret Haaretz st. 16 Totseret Haaretz st. Tel-Aviv, Israel Tel-Aviv, Israel Tel-Aviv, Israel [email protected] [email protected] [email protected] Abstract 1 Introduction This paper presents a comprehensive NLP sys- 1.1 The common Semitic basis from an NLP tem by Melingo that has been recently developed standpoint for Arabic, based on MorfixTM – an operational formerly developed highly successful comprehen- Modern Standard Arabic (MSA) and Modern Hebrew (MH) share the basic Semitic traits: rich sive Hebrew NLP system. morphology, based on consonantal roots (Jiðr / The system discussed includes modules for Šoreš)1, which depends on vowel changes and in morphological analysis, context sensitive lemmati- some cases consonantal insertions and deletions to zation, vocalization, text-to-phoneme conversion, create inflections and derivations.2 and syntactic-analysis-based prosody (intonation) For example, in MSA: the consonantal root model. It is employed in applications such as full /ktb/ combined with the vocalic pattern CaCaCa text search, information retrieval, text categoriza- derives the verb kataba ‘to write’. This derivation tion, textual data mining, online contextual dic- is further inflected into forms that indicate seman- tionaries, filtering, and text-to-speech applications tic features, such as number, gender, tense etc.: katab-tu ‘I wrote’, katab-ta ‘you (sing. masc.) in the fields of telephony and accessibility and wrote’, katab-ti ‘you (sing. fem.) wrote, ?a-ktubu could serve as a handy accessory for non-fluent ‘I write/will write’, etc. -
Delateralisation in Arabic and Mehri
Dialectologia 23 (2019), 1-23. ISSN: 2013-2247 Received 22 May 2017. Accepted 3 October 2017. DELATERALISATION IN ARABIC AND MEHRI Munira Al-AZRAQI Imam Abdulrahman Bin Faisal University, Saudi Arabia** [email protected] Abstract The Arabic lateral ḍ was considered extinct. However, it was found lately in use in some Arabic dialects such as RiJāl Almaʕ in southwest Saudi Arabia and in some varieties of the Mehri language. Nevertheless, delateralisation is apparent. To assess the extent of delateralisation, this study investigates the current usage of the lateral *ḍ among RiJāl Almaʕ speakers in Abha city, southwest Saudi Arabia and Mehri speakers in Dammam, east Saudi Arabia. 74 speakers, 38 speakers of RiJāl Almaʕ in Abha city and 36 Mehri speakers in Dammam city, participated in this study. Their age ranges from 15 to 75 years old. The data comprises almost 56 hours of audio recording captured during informal interviews. The findings show that there is a delateralisation occurring among some younger and educated speakers of Mehri and RiJāl Almaʕ. Keywords delateralisation, Arabic, Mehri, RiJāl Almaʕ DESLATERALIZACIÓN EN ÁRABE Y EN MEHRI Resumen La ḍ lateral árabe se consideraba extinguida. Sin embargo, últimamente se ha constatado su uso en algunos dialectos árabes como RiJāl Almaʕ en el suroeste de Arabia Saudita y en algunas variedades de la lengua Mehri. Con todo, la deslateralización es aparente. Para evaluar el alcance de la deslateralización, este estudio investiga el uso actual de la lateral *ḍ entre los hablantes de RiJāl Almaʕ en la ciudad de Abha, situada al suroeste de Arabia Saudita, y entre los hablantes de Mehri en Dammam, al este de Arabia Saudita. -
Arabic and Contact-Induced Change Christopher Lucas, Stefano Manfredi
Arabic and Contact-Induced Change Christopher Lucas, Stefano Manfredi To cite this version: Christopher Lucas, Stefano Manfredi. Arabic and Contact-Induced Change. 2020. halshs-03094950 HAL Id: halshs-03094950 https://halshs.archives-ouvertes.fr/halshs-03094950 Submitted on 15 Jan 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Arabic and contact-induced change Edited by Christopher Lucas Stefano Manfredi language Contact and Multilingualism 1 science press Contact and Multilingualism Editors: Isabelle Léglise (CNRS SeDyL), Stefano Manfredi (CNRS SeDyL) In this series: 1. Lucas, Christopher & Stefano Manfredi (eds.). Arabic and contact-induced change. Arabic and contact-induced change Edited by Christopher Lucas Stefano Manfredi language science press Lucas, Christopher & Stefano Manfredi (eds.). 2020. Arabic and contact-induced change (Contact and Multilingualism 1). Berlin: Language Science Press. This title can be downloaded at: http://langsci-press.org/catalog/book/235 © 2020, the authors Published under the Creative Commons Attribution -
Language, Ideology and Politics in Croatia
Language, Ideology and Politics in Croatia M at e k a p o v i ć University of Zagreb, Department of Linguistics, Faculty of Humanities and Social Sciences, Ivana Lučića 3, HR – 10 000 Zagreb, [email protected] SCN IV/2 [2011], 45–56 Izhajajoč deloma iz osnovnih tez svoje pred kratkim izšle knjige Čiji je jezik (Čigav je jezik?) avtor podaja pregled zapletenega odnosa med jezikom, ideologijo in politiko na Hrvaškem v preteklih dveh desetletjih, vključno z novimi primeri in razčlembami. Razprava se osredotoča na vprašanja, povezana s Hrvaško, ki so lahko zanimiva za tuje slaviste in jezikoslovce, medtem ko se knjiga (v hrvaščini) ukvarja s problemi jezika, politike, ideologije in družbenega jeziko- slovja na splošno. Based in part on his recent book Čiji je jezik? (Who does Language Belong to?), the author reviews the intricate relation of language, ideology, and politics in Croatia in the last 20 years, including new examples and analyses. The article emphasizes problems related to Croatia specifically, which might be of interest to foreign Slavists and linguists, while the monograph (in Croatian) deals with the prob- lems of language, society, politics, ideology, and sociolinguistics in general. Ključne besede: jezikovna politika, jezikovno načrtovanje, purizem, hrvaški jezik, jezik v nekdanji Jugoslaviji Key words: language politics, language planning, purism, Croatian language, language in former Yugoslavia Introduction1 The aim of this article is to provide a general and brief overview of some problems concerning the intricate relation of language, ideology, and politics in Croatia in the last 20 years. The bulk of the article consists of some of the 1 I would like to thank Marko Kapović for reading the first draft of the article carefully. -
Identifying Semitic Roots: Machine Learning with Linguistic Constraints
Identifying Semitic Roots: Machine Learning with Linguistic Constraints Ezra Daya∗ University of Haifa Dan Roth∗∗ University of Illinois Shuly Wintner† University of Haifa Words in Semitic languages are formed by combining two morphemes: a root and a pattern. The root consists of consonants only, by default three, and the pattern is a combination of vowels and consonants, with non-consecutive “slots” into which the root consonants are inserted. Identifying the root of a given word is an important task, considered to be an essential part of the morphological analysis of Semitic languages, and information on roots is important for linguistics research as well as for practical applications. We present a machine learning approach, augmented by limited linguistic knowledge, to the problem of identifying the roots of Semitic words. Although programs exist which can extract the root of words in Arabic and Hebrew, they are all dependent on labor-intensive construction of large-scale lexicons which are components of full-scale morphological analyzers. The advantage of our method is an automation of this process, avoiding the bottleneckof having to laboriously list the root and pattern of each lexeme in the language. To the best of our knowledge, this is the first application of machine learning to this problem, and one of the few attempts to directly address non-concatenative morphology using machine learning. More generally, our results shed light on the problem of combining classifiers under (linguistically motivated) constraints. 1. Introduction The standard account of word-formation processes in Semitic languages describes words as combinations of two morphemes: a root and a pattern.1 The root consists of consonants only, by default three (although longer roots are known), called radicals. -
Cognate Words in Mehri and Hadhrami Arabic
Cognate Words in Mehri and Hadhrami Arabic Hassan Obeid Alfadly* Khaled Awadh Bin Mukhashin** Received: 18/3/2019 Accepted: 2/5/2019 Abstract The lexicon is one important source of information to establish genealogical relations between languages. This paper is an attempt to describe the lexical similarities between Mehri and Hadhrami Arabic and to show the extent of relatedness between them, a very little explored and described topic. The researchers are native speakers of Hadhrami Arabic and they paid many field visits to the area where Mehri is spoken. They used the Swadesh list to elicit their data from more than 20 Mehri informants and from Johnston's (1987) dictionary "The Mehri Lexicon and English- Mehri Word-list". The researchers employed lexicostatistical techniques to analyse their data and they found out that Mehri and Hadhrmi Arabic have so many cognate words. This finding confirms Watson (2011) claims that Arabic may not have replaced all the ancient languages in the South-Western Arabian Peninsula and that dialects of Arabic in this area including Hadhrami Arabic are tinged, to a greater or lesser degree, with substrate features of the Pre- Islamic Ancient and Modern South Arabian languages. Introduction: three branches including Central Semitic, Historically speaking, the Semitic language Ethiopian and Modern south Arabian languages family from which both of Arabic and Mehri (henceforth MSAL). Though Arabic and Mehri descend belong to a larger family of languages belong to the West Semitic, Arabic descends called Afro-Asiatic or Hamito-Semitic that from the Central Semitic and Mehri from includes Semitic, Egyptian, Cushitic, Omotic, (MSAL) which consists of two branches; the Berber and Chadic (Rubin, 2010). -
Persian, Farsi, Dari, Tajiki: Language Names and Language Policies
University of Pennsylvania ScholarlyCommons Department of Anthropology Papers Department of Anthropology 2012 Persian, Farsi, Dari, Tajiki: Language Names and Language Policies Brian Spooner University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/anthro_papers Part of the Anthropological Linguistics and Sociolinguistics Commons, and the Anthropology Commons Recommended Citation (OVERRIDE) Spooner, B. (2012). Persian, Farsi, Dari, Tajiki: Language Names and Language Policies. In H. Schiffman (Ed.), Language Policy and Language Conflict in Afghanistan and Its Neighbors: The Changing Politics of Language Choice (pp. 89-117). Leiden, Boston: Brill. This paper is posted at ScholarlyCommons. https://repository.upenn.edu/anthro_papers/91 For more information, please contact [email protected]. Persian, Farsi, Dari, Tajiki: Language Names and Language Policies Abstract Persian is an important language today in a number of countries of west, south and central Asia. But its status in each is different. In Iran its unique status as the only official or national language continueso t be jealously guarded, even though half—probably more—of the population use a different language (mainly Azari/Azeri Turkish) at home, and on the streets, though not in formal public situations, and not in writing. Attempts to broach this exclusive status of Persian in Iran have increased in recent decades, but are still relatively minor. Persian (called tajiki) is also the official language ofajikistan, T but here it shares that status informally with Russian, while in the west of the country Uzbek is also widely used and in the more isolated eastern part of the country other local Iranian languages are now dominant. -
The Biradical Origin of Semitic Roots
Copyright by Bernice Varjick Hecker 2007 The Dissertation Committee for Bernice Varjick Hecker certifies that this is the approved version of the following dissertation: The Biradical Origin of Semitic Roots Committee: Robert D. King, Supervisor Robert T. Harms Richard P. Meier Esther L. Raizen Peter F. Abboud THE BIRADICAL ORIGIN OF SEMITIC ROOTS by Bernice Varjick Hecker, M.A., M.A. Dissertation Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy The University of Texas at Austin May, 2007 Dedication To Mark Southern, who awakened and sustained my interest in the Ancient Near East. Acknowledgments I would first like to thank Prof. Harms, who supervised my earlier paper, for teaching me that there is no way to conclusively prove a theory about an early stage of a prehistoric language but that it was possible to demonstrate its likelihood. His comments at an early stage of this work were invaluable in showing me how to go about doing so. I would also like to thank Prof. King, my dissertation supervisor, who was an unfailing font of support and who gave me excellent advice and direction. My husband, Ran Moran, was the sine qua non of this project. There is no way that I could have completed it without his help, both in accommodating to my schedule and in expending all the resources that I brought to bear on writing this dissertation. v The Biradical Origin of Semitic Roots Publication No._____________ Bernice Varjick Hecker, Ph.D.