Guida Dello Studente
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Automatic Identification of Arabic Language Varieties and Dialects in Social Media
Automatic Identification of Arabic Language Varieties and Dialects in Social Media Fatiha Sadat Farnazeh Kazemi Atefeh Farzindar University of Quebec in NLP Technologies Inc. NLP Technologies Inc. Montreal, 201 President Ken- 52 Le Royer Street W., 52 Le Royer Street W., nedy, Montreal, QC, Canada Montreal, QC, Canada Montreal, QC, Canada [email protected] [email protected] [email protected] Abstract Modern Standard Arabic (MSA) is the formal language in most Arabic countries. Arabic Dia- lects (AD) or daily language differs from MSA especially in social media communication. However, most Arabic social media texts have mixed forms and many variations especially be- tween MSA and AD. This paper aims to bridge the gap between MSA and AD by providing a framework for AD classification using probabilistic models across social media datasets. We present a set of experiments using the character n-gram Markov language model and Naive Bayes classifiers with detailed examination of what models perform best under different condi- tions in social media context. Experimental results show that Naive Bayes classifier based on character bi-gram model can identify the 18 different Arabic dialects with a considerable over- all accuracy of 98%. 1 Introduction Arabic is a morphologically rich and complex language, which presents significant challenges for nat- ural language processing and its applications. It is the official language in 22 countries spoken by more than 350 million people around the world1. Moreover, the Arabic language exists in a state of diglossia where the standard form of the language, Modern Standard Arabic (MSA) and the regional dialects (AD) live side-by-side and are closely related (Elfardy and Diab, 2013). -
Saudi Dialects: Are They Endangered?
Academic Research Publishing Group English Literature and Language Review ISSN(e): 2412-1703, ISSN(p): 2413-8827 Vol. 2, No. 12, pp: 131-141, 2016 URL: http://arpgweb.com/?ic=journal&journal=9&info=aims Saudi Dialects: Are They Endangered? Salih Alzahrani Taif University, Saudi Arabia Abstract: Krauss, among others, claims that languages will face death in the coming centuries (Krauss, 1992). Austin (2010a) lists 7,000 languages as existing and spoken in the world today. Krauss estimates that this figure could come down to 600. That is, most the world's languages are endangered. Therefore, an endangered language is a language that loses her speakers within a few generations. According to Dorian (1981), there is what is called ―tip‖ in language endangerment. He argues that a language's decline can start slowly but suddenly goes through a rapid decline towards the extinction. Thus, languages must be protected at much earlier stage. Arabic dialects such as Zahrani Spoken Arabic (ZSA), and Faifi Spoken Arabic (henceforth, FSA), which are spoken in the southern region of Saudi Arabia, have not been studied, yet. Few people speak these dialects, among many other dialects in the same region. However, the problem is that most these dialects' native speakers are moving to other regions in Saudi Arabia where they use other different dialects. Therefore, are these dialects endangered? What other factors may cause its endangerment? Have they been documented before? What shall we do? This paper discusses three main different points regarding this issue: language and endangerment, languages documentation and description and Arabic language and its family, giving a brief history of Saudi dialects comparing their situation with the whole existing dialects. -
A Comprehensive NLP System for Modern Standard Arabic and Modern Hebrew
A Comprehensive NLP System for Modern Standard Arabic and Modern Hebrew Morphological analysis, lemmatization, vocalization, disambiguation and text-to-speech Dror Kamir Naama Soreq Yoni Neeman Melingo Ltd. Melingo Ltd. Melingo Ltd. 16 Totseret Haaretz st. 16 Totseret Haaretz st. 16 Totseret Haaretz st. Tel-Aviv, Israel Tel-Aviv, Israel Tel-Aviv, Israel [email protected] [email protected] [email protected] Abstract 1 Introduction This paper presents a comprehensive NLP sys- 1.1 The common Semitic basis from an NLP tem by Melingo that has been recently developed standpoint for Arabic, based on MorfixTM – an operational formerly developed highly successful comprehen- Modern Standard Arabic (MSA) and Modern Hebrew (MH) share the basic Semitic traits: rich sive Hebrew NLP system. morphology, based on consonantal roots (Jiðr / The system discussed includes modules for Šoreš)1, which depends on vowel changes and in morphological analysis, context sensitive lemmati- some cases consonantal insertions and deletions to zation, vocalization, text-to-phoneme conversion, create inflections and derivations.2 and syntactic-analysis-based prosody (intonation) For example, in MSA: the consonantal root model. It is employed in applications such as full /ktb/ combined with the vocalic pattern CaCaCa text search, information retrieval, text categoriza- derives the verb kataba ‘to write’. This derivation tion, textual data mining, online contextual dic- is further inflected into forms that indicate seman- tionaries, filtering, and text-to-speech applications tic features, such as number, gender, tense etc.: katab-tu ‘I wrote’, katab-ta ‘you (sing. masc.) in the fields of telephony and accessibility and wrote’, katab-ti ‘you (sing. fem.) wrote, ?a-ktubu could serve as a handy accessory for non-fluent ‘I write/will write’, etc. -
Classical and Modern Standard Arabic Marijn Van Putten University of Leiden
Chapter 3 Classical and Modern Standard Arabic Marijn van Putten University of Leiden The highly archaic Classical Arabic language and its modern iteration Modern Standard Arabic must to a large extent be seen as highly artificial archaizing reg- isters that are the High variety of a diglossic situation. The contact phenomena found in Classical Arabic and Modern Standard Arabic are therefore often the re- sult of imposition. Cases of borrowing are significantly rarer, and mainly found in the lexical sphere of the language. 1 Current state and historical development Classical Arabic (CA) is the highly archaic variety of Arabic that, after its cod- ification by the Arab Grammarians around the beginning of the ninth century, becomes the most dominant written register of Arabic. While forms of Middle Arabic, a style somewhat intermediate between CA and spoken dialects, gain some traction in the Middle Ages, CA remains the most important written regis- ter for official, religious and scientific purposes. From the moment of CA’s rise to dominance as a written language, the whole of the Arabic-speaking world can be thought of as having transitioned into a state of diglossia (Ferguson 1959; 1996), where CA takes up the High register and the spoken dialects the Low register.1 Representation in writing of these spoken dia- lects is (almost) completely absent in the written record for much of the Middle Ages. Eventually, CA came to be largely replaced for administrative purposes by Ottoman Turkish, and at the beginning of the nineteenth century, it was function- ally limited to religious domains (Glaß 2011: 836). -
Arabic and Contact-Induced Change Christopher Lucas, Stefano Manfredi
Arabic and Contact-Induced Change Christopher Lucas, Stefano Manfredi To cite this version: Christopher Lucas, Stefano Manfredi. Arabic and Contact-Induced Change. 2020. halshs-03094950 HAL Id: halshs-03094950 https://halshs.archives-ouvertes.fr/halshs-03094950 Submitted on 15 Jan 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Arabic and contact-induced change Edited by Christopher Lucas Stefano Manfredi language Contact and Multilingualism 1 science press Contact and Multilingualism Editors: Isabelle Léglise (CNRS SeDyL), Stefano Manfredi (CNRS SeDyL) In this series: 1. Lucas, Christopher & Stefano Manfredi (eds.). Arabic and contact-induced change. Arabic and contact-induced change Edited by Christopher Lucas Stefano Manfredi language science press Lucas, Christopher & Stefano Manfredi (eds.). 2020. Arabic and contact-induced change (Contact and Multilingualism 1). Berlin: Language Science Press. This title can be downloaded at: http://langsci-press.org/catalog/book/235 © 2020, the authors Published under the Creative Commons Attribution -
The Arabic Language: a Latin of Modernity? Tomasz Kamusella University of St Andrews
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by St Andrews Research Repository Journal of Nationalism, Memory & Language Politics Volume 11 Issue 2 DOI 10.1515/jnmlp-2017-0006 The Arabic Language: A Latin of Modernity? Tomasz Kamusella University of St Andrews Abstract Standard Arabic is directly derived from the language of the Quran. The Ara- bic language of the holy book of Islam is seen as the prescriptive benchmark of correctness for the use and standardization of Arabic. As such, this standard language is removed from the vernaculars over a millennium years, which Arabic-speakers employ nowadays in everyday life. Furthermore, standard Arabic is used for written purposes but very rarely spoken, which implies that there are no native speakers of this language. As a result, no speech com- munity of standard Arabic exists. Depending on the region or state, Arabs (understood here as Arabic speakers) belong to over 20 different vernacular speech communities centered around Arabic dialects. This feature is unique among the so-called “large languages” of the modern world. However, from a historical perspective, it can be likened to the functioning of Latin as the sole (written) language in Western Europe until the Reformation and in Central Europe until the mid-19th century. After the seventh to ninth century, there was no Latin-speaking community, while in day-to-day life, people who em- ployed Latin for written use spoke vernaculars. Afterward these vernaculars replaced Latin in written use also, so that now each recognized European lan- guage corresponds to a speech community. -
A Reply to the New Arabia Theory by Ahmad Al-Jallad
City University of New York (CUNY) CUNY Academic Works Publications and Research CUNY Central Office 2020 The Case for Early Arabia and Arabic Language: A Reply to the New Arabia Theory by Ahmad al-Jallad Saad D. Abulhab CUNY Central Office How does access to this work benefit ou?y Let us know! More information about this work at: https://academicworks.cuny.edu/oaa_pubs/16 Discover additional works at: https://academicworks.cuny.edu This work is made publicly available by the City University of New York (CUNY). Contact: [email protected] The Case for Early Arabia and Arabic Language: A Reply to the New Arabia Theory by Ahmad al-Jallad Saad D. Abulhab (City University of New York) April, 2020 The key aspect of my readings of the texts of ancient Near East languages stems from my evidence-backed conclusion that these languages should be classified and read as early Arabic. I will explore here this central point by replying to a new theory with an opposite understanding of early Arabia and the Arabic language, put forth by Ahmad al-Jallad, a scholar of ancient Near East languages and scripts. In a recent debate with al-Jallad, a self-described Semitic linguist, he proclaimed that exchanging the term 'Semitic' for ‘early Arabic’ or ‘early fuṣḥā’ is “simply a matter of nomenclature.”1 While his interpretation of the term Semitic sounds far more moderate than that of most Western philologists and epigraphists, it is not only fundamentally flawed and misleading, but also counterproductive. Most scholars, unfortunately, continue to misinform their students and the scholarly community by alluding to a so-called Semitic mother language, as a scientific fact. -
Modern Standard Arabic 3
® Modern Standard Arabic 3 “I have completed the entire Pimsleur Spanish series. I have always wanted to learn, but failed on numerous occasions. Shockingly, this method worked beautifully. ” R. Rydzewsk (Burlington, NC) “The thing is, Pimsleur is PHENOMENALLY EFFICIENT at advancing your oral skills wherever you are, and you don’t have to make an appointment or be at your computer or deal with other students. ” Ellen Jovin (NY, NY) “I looked at a number of different online and self-taught courses before settling on the Pimsleur courses. I could not have made a better choice. ” M. Jaffe (Mesa, AZ) Modern Standard Arabic 3 Travelers should always check with their nation's State Department for current advisories on local conditions before traveling abroad. Booklet Design: Maia Kennedy © and ‰ Recorded Program 2015 Simon & Schuster, Inc. © Reading Booklet 2015 Simon & Schuster, Inc. Pimsleur® is an imprint of Simon & Schuster Audio, a division of Simon & Schuster, Inc. Mfg. in USA. All rights reserved. ii Modern Standard Arabic 3 ACKNOWLEDGMENTS VOICES English-Speaking Instructor . Ray Brown Arabic-Speaking Instructor. Husam Karzoun Female Arabic Speaker . Maya Asmar Male Arabic Speaker . .Bayhas Kana COURSE WRITERS Dr. Mahdi Alosh ♦ Shannon Rossi EDITORS Beverly D. Heinle ♦ Mary E. Green REVIEWERS Ibtisam Alama ♦ Duha Shamaileh Bayhas Kana PRODUCER Sarah H. McInnis RECORDING ENGINEER Peter S. Turpin Simon & Schuster Studios, Concord, MA iii Modern Standard Arabic 3 Table of Contents Introduction. 1 The Arabic Alphabet . 3 Reading Lessons . 7 Arabic Alphabet Chart . 8 Diacritical Marks . 10 Lesson One . 11 Lesson Two . 14 Lesson Three . 17 Lesson Four . 20 Lesson Five . -
The Evolution of the Arabic Language Through Online Writing: the Explosion of 2011
- 1 - The evolution of the Arabic language through online writing: the explosion of 2011 Saussan Khalil Abstract The role of the internet in the popular protests of 2011 cannot be overestimated. Most importantly, the internet allowed online activists to escape censorship and communicate to thousands if not millions of people in real time. What is interesting about this form of communication is the language of choice particularly in Egypt – for centuries Classical (CA) or Modern Standard (MSA) Arabic have been the accepted forms of writing; however, the form of language being used online leans more towards colloquial Arabic, which has up until now only been accepted as a spoken form. The relationship between the written and spoken forms of Arabic in Egypt has been detailed by Haeri (2003), but the use of spoken Arabic in online writing is yet to be explored. This paper looks at the relationship between the form of the language used in online writing and the messages being conveyed. The suggestion is that away from the censorship of state media and the press, writers are free to use dialectal forms of the language for a freer, more direct approach to their readers, which has been more effective in communicating their message than the use of CA or MSA would have been. - 2 - Introduction Ferguson (1959) first described Arabic as a ‘diglossic’ language, meaning it has distinct written and spoken forms. This premise has been generally accepted with Classical Arabic (CA) and later Modern Standard Arabic (MSA) constituting the written form, and the numerous dialects of Arabic as its spoken forms. -
Arabic Linguistics a Historiographic Overview
ROCZNIK ORIENTALISTYCZNY, T. LXV, Z. 2, 2012, (s. 21–47) EDWARD LIPIŃSKI Arabic Linguistics A Historiographic Overview Abstract The study of Arabic language seems to have started under the driving need to establish a correct reading and interpretation of the Qur’ān. Notwithstanding the opinions of some writers about its origins one should stress that the script and spelling of the Holy Writ derives directly from the Nabataean cursive. Aramaic Nabataean script was used to write Old Arabian since the first century A.D., also at Taymā’ and Madā’in Ṣaliḥ, in the northern part of the Arabian Peninsula. Variant readings and divergent interpretations of Qur’ānic sentences, based on ancient Arabic dialects, are not expected to disturb the Arabic grammatical tradition, which was possibly influenced to some extent by Indian theories and Aristotelian concepts. It served as foundation to modern European studies and was then expanded to Middle Arabic, written mainly by Jews and Christians, and to the numerous modern dialects. From the mid-19th century onwards, attention was given also to pre-classical North-Arabian, attested by Ṣafaitic, Ṯamūdic, Liḥyanite, and Ḥasaean inscriptions, without forgetting the North-Arabian background and the loanwords of Nabataean Aramaic, as well as the dialectal information from the 7th–8th centuries, preserved in Arabic sources. Keywords: Arabic language, Linguistics, Grammar, Qur’ān, North-Arabian The study of Semitic grammar, either Arabic, Syriac or Hebrew, started under the driving need to establish a correct reading and a proper interpretation of the Holy Scriptures, the Qur’ān and the Bible, both in their formal and semantic dimensions. -
Arabic and Aramaic in Iraq: Language and Syriac Christian Commitment to the Arab Nationalist Project (1920-1950) Issue Date: 2020-01-08
Cover Page The handle http://hdl.handle.net/1887/82480 holds various files of this Leiden University dissertation. Author: Baarda, T.C. Title: Arabic and Aramaic in Iraq: Language and Syriac Christian Commitment to the Arab Nationalist Project (1920-1950) Issue Date: 2020-01-08 Introduction There is nothing in the custom of patriotism named Mus- lim, Christian or Israelite, but there is something called Iraq. —Faisal i, King of Iraq (1920–1933) These words were reportedly uttered by King Faisal in 1921 during a visit to leaders of the Jewish community of Baghdad.1 The quota- tion has become a famous symbol of Faisal’s ideals for the state of Iraq, which had been established under a British mandate a year earlier. In the new country of Iraq, all citizens regardless of religion were sup- posed to be equal under the umbrella of Iraq as an Arab state, which Faisal embodied because of his major role during the Arab revolt in the Hijaz. This included the small but significant two to four percent of Christians, the great majority of whom belonged to one of the four churches of the Syriac tradition. After Faisal’s death, Chaldean Chris- tians proudly repeated the words together with a number of other quotations in an obituary in the Chaldean Catholic journal al-Najm (The Star).2 The Chaldean Catholic Church, which had its patriar- chate in Mosul, was the largest of the four Syriac churches in Iraq and staunchly supported the fact that Iraq was an Arab state. Al-Najm was the Chaldean patriarchate’s official mouthpiece in the years 1928–1938 and throughout its years of publication we find words of support for the new state and its king, as well as expressions of belonging to the Arab nation. -
The Phoenician Alphabet Reassessed in Light of Its Descendant Scripts and the Language of the Modern Lebanese
1 The Phoenician Alphabet Reassessed in Light of its Descendant Scripts and the Language of the Modern Lebanese Joseph Elias [email protected] 27th of May, 2010 The Phoenician Alphabet Reassessed in Light of its Descendant Scripts and the Language of the Modern Lebanese 2 The Phoenician Alphabet Reassessed in Light of its Descendant Scripts and the Language of the Modern Lebanese DISCLAIMER: THE MOTIVATION BEHIND THIS WORK IS PURELY SCIENTIFIC. HENCE, ALL ALLEGATIONS IMPOSING THIS WORK OF PROMOTING A: POLITICAL, RELIGIOUS OR ETHNIC AGENDA WILL BE REPUDIATED. Abstract The contemporary beliefs regarding the Phoenician alphabet are reviewed and challenged, in light of the characteristics found in the ancient alphabets of Phoenicia's neighbours and the language of the modern Lebanese. 3 The Phoenician Alphabet Reassessed in Light of its Descendant Scripts and the Language of the Modern Lebanese Introduction The Phoenician alphabet, as it is understood today, is a 22 letter abjad with a one-to-one letter to phoneme relationship [see Table 1].1 Credited for being the world's first alphabet and mother of all modern alphabets, it is believed to have been inspired by the older hieroglyphics system of nearby Egypt and/or the syllabaries of Cyprus, Crete, and/or the Byblos syllabary - to which the Phoenician alphabet appears to be a graphical subset of.2 3 Table 1: The contemporary decipherment of the Phoenician alphabet. Letter Name Glyph Phonetic Value (in IPA) a aleph [ʔ] b beth [b] g gamil [g] d daleth [d] h he [h] w waw [w] z zayin [z] H heth [ħ] T teth [tʕ] y yodh [j] k kaph [k] l lamedh [l] m mem [m] n nun [n] Z samekh [s] o ayin [ʕ] p pe [p] S tsade [sʕ] q qoph [q] r resh [r] s shin [ʃ] t tau [t] 1 F.