Dialectal Arabic Processing Using Deep Learning
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
“The Overuse of Italian Loanwords in the Daily Speech of Tripoli University Students: the Impact of Gender and Residential Place”
العدد العشرون تاريخ اﻹصدار: 2 – ُحزيران – 2020 م ISSN: 2663-5798 www.ajsp.net “The Overuse of Italian Loanwords in the Daily Speech of Tripoli University Students: the Impact of Gender and Residential Place” Prepared by Jalal Al Dain Y. Abidah, English Dept., Faculty of Education - Janzour, Tripoli University 19 Arab Journal for Scientific Publishing (AJSP) ISSN: 2663-5798 العدد العشرون تاريخ اﻹصدار: 2 – ُحزيران – 2020 م ISSN: 2663-5798 www.ajsp.net Abstract The study tries to explore the impact of social factors of gender and residential place on the use of Italian loanwords by Libyan university students (using Tripoli University as an example) and how the mentioned social factors affect their daily speech. To answer the questions of the study, a sample of 60 Tripoli university students are selected randomly in the campus (A) of University of Tripoli. They were divided into two groups according to their Gender and residential place. In order to collect data, a questionnaire was developed for this purpose. It generated data regarding the use of 150 Italian loanwords by both groups. The mean of using Italian loanwords in both groups was analyzed and computed using SPSS. However, the study reveals the impact of residential area where Italian loanwords were more incorporated by rural students than urbanites. The results of the study revealed that there was a significant statistical difference at (α≤0.05) among the means of both groups regarding the use of Italian loanwords in daily speech due to residential area. In contrasts, gender emerges as insignificant. Keyword Italian loanwords, Colloquial Arabic, Gender, Libya. -
The Historical Development of the Maltese Plural Suffixes -Iet and -(I)Jiet Having Spent the Last Millennium in Close Contact Wi
The historical development of the Maltese plural suffixes -iet and -(i)jiet Having spent the last millennium in close contact with several Indo-European languages, Maltese, the modern descendant of Siculo-Arabic (Brincat 2011), possesses various pluralization strategies. In this paper I explore the historical development of two such strategies of Semitic- origin: the suffixes -iet and -(i)jiet (e.g. papa ‘pope’, pl. papiet; omm ‘mother’, pl. ommijiet). Specialists agree that both suffixes originated from the Arabic plural suffix -āt (Borg 1976; Mifsud 1995); however, no research has explained the development of -(i)jiet, nor connected its development to that of -iet. I argue that the development of -(i)jiet was driven by the influx of i- final words which resulted from contact with Italian: Maltese speakers affixed -iet to such words, triggering a glide-epenthesis that occurs elsewhere in Maltese (e.g. Mifsud 1996: 34) and in other varieties of Arabic (e.g. Erwin 1963; Cowell 1964; Owens 1984; Harrell 2004). With a large number of plurals now ending in ijiet, speakers reanalyzed this sequence as a unique plural suffix and began applying it to new non-i-final words as well. Since only Maltese experienced this influx of i-final words, it was only in Maltese that speakers reanalyzed this sequence as a separate suffix. Additionally, I argue against an explanation of the development of -(i)jiet that does not rely on epenthesis. With regard to the suffix -iet, I argue that two properties unique to it – the obligatory omission of stem-final vowels upon pluralization, and the near-universal tendency to pluralize only a-final words – emerged from a separate reanalysis of the pluralization of collective nouns that reflects a general weakening of the Semitic element in Maltese under Indo-European contact. -
Different Dialects of Arabic Language
e-ISSN : 2347 - 9671, p- ISSN : 2349 - 0187 EPRA International Journal of Economic and Business Review Vol - 3, Issue- 9, September 2015 Inno Space (SJIF) Impact Factor : 4.618(Morocco) ISI Impact Factor : 1.259 (Dubai, UAE) DIFFERENT DIALECTS OF ARABIC LANGUAGE ABSTRACT ifferent dialects of Arabic language have been an Dattraction of students of linguistics. Many studies have 1 Ali Akbar.P been done in this regard. Arabic language is one of the fastest growing languages in the world. It is the mother tongue of 420 million in people 1 Research scholar, across the world. And it is the official language of 23 countries spread Department of Arabic, over Asia and Africa. Arabic has gained the status of world languages Farook College, recognized by the UN. The economic significance of the region where Calicut, Kerala, Arabic is being spoken makes the language more acceptable in the India world political and economical arena. The geopolitical significance of the region and its language cannot be ignored by the economic super powers and political stakeholders. KEY WORDS: Arabic, Dialect, Moroccan, Egyptian, Gulf, Kabael, world economy, super powers INTRODUCTION DISCUSSION The importance of Arabic language has been Within the non-Gulf Arabic varieties, the largest multiplied with the emergence of globalization process in difference is between the non-Egyptian North African the nineties of the last century thank to the oil reservoirs dialects and the others. Moroccan Arabic in particular is in the region, because petrol plays an important role in nearly incomprehensible to Arabic speakers east of Algeria. propelling world economy and politics. -
Arabic and Contact-Induced Change Christopher Lucas, Stefano Manfredi
Arabic and Contact-Induced Change Christopher Lucas, Stefano Manfredi To cite this version: Christopher Lucas, Stefano Manfredi. Arabic and Contact-Induced Change. 2020. halshs-03094950 HAL Id: halshs-03094950 https://halshs.archives-ouvertes.fr/halshs-03094950 Submitted on 15 Jan 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Arabic and contact-induced change Edited by Christopher Lucas Stefano Manfredi language Contact and Multilingualism 1 science press Contact and Multilingualism Editors: Isabelle Léglise (CNRS SeDyL), Stefano Manfredi (CNRS SeDyL) In this series: 1. Lucas, Christopher & Stefano Manfredi (eds.). Arabic and contact-induced change. Arabic and contact-induced change Edited by Christopher Lucas Stefano Manfredi language science press Lucas, Christopher & Stefano Manfredi (eds.). 2020. Arabic and contact-induced change (Contact and Multilingualism 1). Berlin: Language Science Press. This title can be downloaded at: http://langsci-press.org/catalog/book/235 © 2020, the authors Published under the Creative Commons Attribution -
Expression of Attributive Possession in Tunisian Arabic: the Role of Language Contact
Expression of Attributive Possession in Tunisian Arabic: The Role of Language Contact Lotfi Sayahi 1 Introduction Possession is a universal concept that describes a particular link between two entities. It can denote a set of relationships that range from strict ownership to more loose connections that are open to interpretations. At the semantic level, a major distinction has been established between alienable possession and inalienable possession. Alienable possession refers to relationships that are temporary or which can be ended freely such as ownership of material objects. In the case of inalienable possession the possessed entity is intrinsi- cally linked to the possessor, as in the case of part-whole relations (e.g., body parts) or kinship relations. Languages, however, vary when it comes to what counts as alienable vs. inalienable possession (Heine 1997; Payne and Barshi 1999; Herslund and Baron 2001). At the structural level, two major types of pos- sessive structures exist. The first type, which is not the object of the current study, is predicative possession which uses verb forms to express the possessive link. The second type is attributive possession, also referred to as adnominal possession, which marks the possession in the noun phrase through different morphosyntactic structures. Despite being such a common category, only a few studies have described change in the expression of possession in cases of language contact. Weinreich (1963) in his seminal work mentions the case of Estonian, Amharic, and Modern Hebrew which developed or increased the use of analytic possession constructions as a result of contact with other languages (Weinreich 1963: 41–42). -
Berber Etymologies in Maltese1
Berber E tymologies in Maltese 1 Lameen Souag LACITO (CNRS – Paris Sorbonne – INALCO) France ﻣﻠﺧص : ﻻ ﯾزال ﻣدى ﺗﺄﺛﯾر اﻷﻣﺎزﯾﻐﯾﺔ ﻓﻲ اﻟﻣﺎﻟطﯾﺔ ﻏﺎﻣﺿﺎ، واﻟﻛﺛﯾر ﻣن اﻻﻗﺗراﺣﺎت اﻟﻣﻧﺷورة ﻓﻲ ھذا اﻟﻣﺟﺎل ﻟﯾﺳت ﻣؤﻛدة . ھذه اﻟﻣﻘﺎﻟﺔ ﺗﻘﯾم ﻛل ﻛﻠﻣﺔ ﻣﺎﻟطﯾﺔ اﻗ ُﺗرح ﻟﮭﺎ أﺻل أﻣﺎزﯾﻐﻲ ﻣن ﻗﺑل، ﻓﺗﺳﺗﺑﻌد ﺧﻣﺳﯾن ﻛﻠﻣﺔ وﺗﻘﺑل ﻋﺷرﯾن . ﻛﻣﺎ أﻧﮭﺎ ﺗﻘﺗرح ﻟﻠﻣرة اﻷوﻟﻰ أﺻوﻻ أﻣﺎزﯾﻐﯾﺔ ﻟﺳﺗﺔ ﻛﻠﻣﺎت ﻣﺎﻟطﯾﺔ ﻏﯾر ھذه اﻟﺳﺑﻌﯾن، وھﻲ " أﻛﻠﺔ ﻟزﺟﺔ " ، " ﺻﻐﯾر " ، " طﯾر اﻟوﻗواق " ، " ﻧﺑﺎت اﻟدﯾس " ، " أﻧﺛﻰ اﻟﺣﺑ ﺎر " ، واﻟﺗﺻﻐﯾر ﺑﻌﻼﻣﺔ اﻟﺗﺄﻧﯾث . ﺑﻧﺎء ﻋﻠﻰ ھذه اﻟﻧﺗﺎﺋﺞ، ﺗﻔﺣص اﻟﻣﻘﺎﻟﺔ اﻟﺗوزﯾﻊ اﻟدﻻﻟﻲ ﻟدى اﻟﻣﻔردات اﻟدﺧﯾﻠﺔ ﻣن اﻷﻣﺎزﯾﻐﯾﺔ ﻟﺗﻠﻘﻲ ﻧظرة ﻋﻠﻰ ﻧوﻋﯾﺔ اﻻﺣﺗﻛﺎك ﺑﯾن اﻟﻌرﺑﯾﺔ واﻷﻣﺎزﯾﻐﯾﺔ ﻓﻲ إﻓرﯾﻘﯾﺔ ﻗﺑل وﺻول ﺑﻧﻲ ھﻼل إﻟﻰ اﻟﻣﻧطﻘﺔ Abstract : The extent of Berber lexical influence on Maltese remains unclear, and many of the published etymological proposals are problematic. This article evaluates the reliability of existing proposed Maltese etymologies involving Berber, excluding 50 but accepting 20, and proposes new Berbe r etymologies for another six Maltese words ( bażina “overcooked, sticky food”, ċkejken “small”, daqquqa “cuckoo”, dis “dis - grass, sparto”, tmilla “female cuttlefish used as bait”), as well as a diminutive formation strategy using - a . Based on the results, it examines the distribution of the loans found, in the hope of casting light on the context of Arabic - Berber contact in Ifrīqiyah before the arrival of the Banū Hilāl. Keywords: Maltese, Berber, Amazigh, etymology, loanword s 1 The author thanks Marijn van Putten, Karim Bensoukas, and two anonymous reviewers for their helpful comments on earlier drafts of this paper, and Abdessalem Saied for providing Nabuel data. © The International Journal of Arabic Linguistics (IJAL) Vol. -
The Historical Development of the Maltese Plural Suffixes -Iet and -(I)Jiet JONATHAN GEARY, UNIVERSITY of ARIZONA ICHL23, SAN ANTONIO, TX AUGUST 4, 2017
7/20/2018 The Historical Development of the Maltese plural suffixes -iet and -(i)jiet JONATHAN GEARY, UNIVERSITY OF ARIZONA ICHL23, SAN ANTONIO, TX AUGUST 4, 2017 Introduction • This paper explores the development of two Maltese suffixes of Semitic origin: -iet (e.g. papa ~ papiet “pope(s)”) and -(i)jiet (e.g. omm ~ ommijiet “mother(s)”). • -(i)jiet developed from -iet due to extensive borrowing of i-final Italian words. ◦ Affixation triggers glide-epenthesis, producing ijiet (i.e. i+iet > ijiet). ◦ With many ijiet-final plurals, speakers reanalyze this as a suffix and extend it to new words. ◦ Glide-epenthesis occurs elsewhere in Maltese and in other varieties of Arabic, but without the contact-induced conditions for reanalysis similar reanalyses did not occur. • -iet acquired two unique properties – restriction to words having a-final singulars; obligatory omission of stem-final vowels upon suffixation – due to a subsequent reanalysis of the pluralization of collective nouns. ◦ This reanalysis reflects a more general weakening of the Semitic element in Maltese. 2 1 7/20/2018 History of Maltese • Maltese is a Semitic language which has been shaped by extensive contact with Indo-European languages over the last millennium (Brincat 2011). ◦ 870 – Arab invasion depopulates the island of Malta. ◦ 1048 – Individuals from Muslim Sicily migrate to Malta, bringing with them Siculo-Arabic (Agius 1996), which would ultimately develop into Maltese. ◦ 1090 – Norman Conquest brings Siculo-Arabic into contact with Sicilian and Latin in Malta, cuts the language off from Classical Arabic and other Arabic varieties. ◦ 1530 – Malta is given to the Knights of St. -
Tekerlemeli Ve Sözlü Oyunlarin KKTC'deki Okul Öncesi Çocukların Türkçe Dil Becerilerine Kazanımları
Ghezewi N., Akbarov A., Socıal Influences On The Choıce Of A Lınguıstıc Varıant In Lıbyan Arabıc Dıalects Motif Akademi Halkbilimi Dergisi / Cilt:9, Sayı:17 / 2016 (Ocak – Haziran), s.105-112 SOCIAL INFLUENCES ON THE CHOICE OF A LINGUISTIC VARIANT IN LIBYAN ARABIC DIALECTS Libya Arap Lehçelerinde Dil Variant Seçimi Üzerindeki Sosyal Etkileri Nesrin GHEZEWI* & Azamat AKBAROV** Öz: Bu çalışmanın amacı, Arap dilinin farklı dil çeşitleri ve Libya’daki ikidillilik durumunu incelemektir . Yüksek ve düşük agizlari belirlemek icin Libya ( Trablusgarp ve Bingazi ) farklı bölgelerden gelen iki farklı düşük agiz benzerlikleri ve farklılıkları karsilastirildi. Bu şehirler, Tripoli doğu tarafında be Bingazi batı tarafında olmak uzere, Libya haritasında bir birine ters tarafta bulunmaktadır. Benzerlikler ve farklılıklar, örnekler ve tablolarin yanı sıra , Modern Standart Arapça aracılığıyla da gösterilmektedir . Kolonizasyon ve göç gibi farklılıklara katkıda bulunan etkiler da açıklanmıştır. Libya Arapça ikidillilik kavramı oldukça karmaşık oldugundan, çok daha fazla çalışma gerektirmektedir . Anahtar Kelimeler : Ikidillilik , diyalekt , Yüksek Lehçe , Düşük Lehçe, Libya , Klasik Arapça , Modern Standart Arapça , Kolonizasyon . Abstract: The purpose of this paper is to look at the different linguistic varieties of the Arabic language, and the diglossic situation in Libya. In order to determine the high and low variety/ varieties, two different low varieties from different regions in Libya (Tripoli and Benghazi) were compared and contrasted. These cities are on opposite sides of the Libyan map, Tripoli being on the western side and Benghazi on the eastern side. The similarities and differences are shown through examples and tables as well as how those varieties measure up to the Modern Standard Arabic. The influences that contributed to the differences such as colonization and migration are explained. -
Automatic Detection of Arabicized Berber and Arabic Varieties
Automatic Detection of Arabicized Berber and Arabic Varieties Wafia Adouane1, Nasredine Semmar2, Richard Johansson3, Victoria Bobicev4 Department of FLoV, University of Gothenburg, Sweden1 CEA Saclay – Nano-INNOV, Institut CARNOT CEA LIST, France2 Department of CSE, University of Gothenburg, Sweden3 Technical University of Moldova4 [email protected], [email protected] [email protected], [email protected] Abstract Automatic Language Identification (ALI) is the detection of the natural language of an input text by a machine. It is the first necessary step to do any language-dependent natural language pro- cessing task. Various methods have been successfully applied to a wide range of languages, and the state-of-the-art automatic language identifiers are mainly based on character n-gram models trained on huge corpora. However, there are many languages which are not yet automatically pro- cessed, for instance minority and informal languages. Many of these languages are only spoken and do not exist in a written format. Social media platforms and new technologies have facili- tated the emergence of written format for these spoken languages based on pronunciation. The latter are not well represented on the Web, commonly referred to as under-resourced languages, and the current available ALI tools fail to properly recognize them. In this paper, we revisit the problem of ALI with the focus on Arabicized Berber and dialectal Arabic short texts. We intro- duce new resources and evaluate the existing methods. The results show that machine learning models combined with lexicons are well suited for detecting Arabicized Berber and different Arabic varieties and distinguishing between them, giving a macro-average F-score of 92.94%. -
Mapping Linguistic Variations in Colloquial Arabic Through Twitter a Centroid-Based Lexical Clustering Approach
(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 11, 2020 Mapping Linguistic Variations in Colloquial Arabic through Twitter A Centroid-based Lexical Clustering Approach 1 Abdulfattah Omar * Hamza Ethleb2 Mohamed Elarabawy Hashem3 Department of English Translation Department Department of English College of Sciences and Humanities Faculty of Languages College of Science and Arts in Tabarjal, Prince Sattam Bin Abdulaziz University University of Tripoli Jouf University, KSA Department of English, Faculty of Arts Tripoli, Libya Faculty of Languages and Translation Port Said University Cairo, Al-Azhar University, Egypt Abstract—The recent years have witnessed the development regional dialectology is so far based on traditional methods; of different computational approaches to the study of linguistic therefore, it is difficult to provide a comprehensive mapping variations and regional dialectology in different languages of the dialect variation of all the colloquial dialects of Arabic. including English, German, Spanish and Chinese. These As thus, this study is concerned with proposing a approaches have proved effective in dealing with large corpora computational model for mapping the linguistic variation and and making reliable generalizations about the data. In Arabic, regional dialectology in Colloquial Arabic through Twitter however, much of the work on regional dialectology is so far based on the lexical choices of speakers. The purpose of the based on traditional methods; therefore, it is difficult to provide study is to explore the lexical patterns for generating regional a comprehensive mapping of the dialectal variations of all the dialect maps as derived from Twitter users. In order to map colloquial dialects of Arabic. -
Tunisian and Libyan Arabic Dialects Common Trends – Recent Developments – Diachronic Aspects
Veronika Ritt-Benmimoun (ed.) Tunisian and Libyan Arabic Dialects Common Trends – Recent Developments – Diachronic Aspects PRENSAS DE LA UNIVERSIDAD DE ZARAGOZA TUNISIAN and Libyan Arabic Dialects : Common Trends – Recent Developments – Diachronic Aspects / Veronika Ritt-Benmimoun (ed.). — Zaragoza : Prensas de la Universidad de Zaragoza, 2017 389 p. ; 24 cm. — (Estudios de Dialectología Árabe ; 13) ISBN 978-84-16933-98-3 1. Lengua árabe–Dialectos. 2. Lengua árabe–Túnez. 3. Lengua árabe–Libia RITT-BENMIMOUN, Veronika 811.411.21’282(611) 811.411.21’282(612) Cualquier forma de reproducción, distribución, comunicación pública o transformación de esta obra solo puede ser realizada con la autorización de sus titulares, salvo excepción prevista por la ley. Diríjase a CEDRO (Centro Español de Derechos Reprográficos, www.cedro.org) si necesita fotocopiar o escanear algún fragmento de esta obra. © Veronika Ritt-Benmimoun © De la presente edición, Prensas de la Universidad de Zaragoza (Vicerrectorado de Cultura y Política Social) 1.ª edición, 2017 Diseño gráfico: Victor M. Lahuerta Colección Estudios de Dialectología Árabe, n.º 13 Prensas de la Universidad de Zaragoza. Edificio de Ciencias Geológicas, c/ Pedro Cerbuna, 12, 50009 Zaragoza, España. Tel.: 976 761 330. Fax: 976 761 063 [email protected] http://puz.unizar.es Esta editorial es miembro de la UNE, lo que garantiza la difusión y comercialización de sus publicaciones a nivel nacional e internacional. Impreso en España Imprime: Servicio de Publicaciones. Universidad de Zaragoza D.L.: Z xxx-2017 Fī (‘in’) as a Marker of the Progressive Aspect in Tunisian Arabic ∗ Karen MCNEIL 1. Introduction In this work I will be discussing the preposition fī and its use as an aspectual marker in Tunisian Arabic. -
RESOURCED LANGUAGES Dialectal Arabic Short Texts
DEPARTMENT OF PHILOSOPHY, LINGUISTICS AND THEORY OF SCIENCE AUTOMATIC DETECTION OF UNDER- RESOURCED LANGUAGES Dialectal Arabic Short Texts Wafia Adouane Master’s Thesis: 30 credits Programme: Master’s Programme in Language Technology Level: Advanced level Semester and year: Spring, 2016 Supervisor: Richard Johansson, Nasredine Semmar and Alan Said Examiner: Staffan Larsson Report number: (number will be provided by the administrators) Keywords: Under-resourced languages, Discrimination between similar languages and language varieties, Arabic varieties, Linguistic resource building Abstract Automatic Language Identification (ALI) is the first necessary step to do any language-dependent natural language processing task. It is the identification of the natural language of the input content by a machine. Being a well-established task in computational linguistics since early 1960's, various methods have been successfully applied to a wide range of languages. The state-of-the-art automatic language identifiers are based on character n-gram models trained on huge corpora. However, there are many natural languages which are not yet automatically processed. For instance, minority languages or informal forms of standard languages (general purpose languages used only in media/administration and taught at schools). Some of these languages are only spoken and do not exist in a written format. The use of social media platforms and new technologies have facilitated the emergence of written format for these spoken languages based on pronunciation. These new written languages are under- resourced, hence the current ALI tools fail to properly recognize them. In this study, we revisit the problem of ALI with the focus on discriminating under-resourced similar languages.