Machine Translation of Arabic Dialects Wael Salloum

Total Page:16

File Type:pdf, Size:1020Kb

Machine Translation of Arabic Dialects Wael Salloum Machine Translation of Arabic Dialects Wael Salloum Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2018 © 2018 Wael Salloum All rights reserved ABSTRACT Machine Translation of Arabic Dialects Wael Salloum This thesis discusses different approaches to machine translation (MT) from Dialectal Arabic (DA) to English. These approaches handle the varying stages of Arabic dialects in terms of types of available resources and amounts of training data. The overall theme of this work revolves around building dialectal resources and MT systems or enriching existing ones using the currently available resources (dialectal or standard) in order to quickly and cheaply scale to more dialects without the need to spend years and millions of dollars to create such resources for every dialect. Unlike Modern Standard Arabic (MSA), DA-English parallel corpora is scarcely avail- able for few dialects only. Dialects differ from each other and from MSA in orthography, morphology, phonology, and to some lesser degree syntax. This means that combining all available parallel data, from dialects and MSA, to train DA-to-English statistical machine translation (SMT) systems might not provide the desired results. Similarly, translating di- alectal sentences with an SMT system trained on that dialect only is also challenging due to different factors that affect the sentence word choices against that of the SMT training data. Such factors include the level of dialectness (e.g., code switching to MSA versus dialectal training data), topic (sports versus politics), genre (tweets versus newspaper), script (Ara- bizi versus Arabic), and timespan of test against training. The work we present utilizes any available Arabic resource such as a preprocessing tool or a parallel corpus, whether MSA or DA, to improve DA-to-English translation and expand to more dialects and sub-dialects. The majority of Arabic dialects have no parallel data to English or to any other foreign language. They also have no preprocessing tools such as normalizers, morphological ana- lyzers, or tokenizers. For such dialects, we present an MSA-pivoting approach where DA sentences are translated to MSA first, then the MSA output is translated to English using the wealth of MSA-English parallel data. Since there is virtually no DA-MSA parallel data to train an SMT system, we build a rule-based DA-to-MSA MT system, ELISSA, that uses morpho-syntactic translation rules along with dialect identification and language modeling components. We also present a rule-based approach to quickly and cheaply build a dialectal morphological analyzer, ADAM, which provides ELISSA with dialectal word analyses. Other Arabic dialects have a relatively small-sized DA-English parallel data amount- ing to a few million words on the DA side. Some of these dialects have dialect-dependent preprocessing tools that can be used to prepare the DA data for SMT systems. We present techniques to generate synthetic parallel data from the available DA-English and MSA- English data. We use this synthetic data to build statistical and hybrid versions of ELISSA as well as improve our rule-based ELISSA-based MSA-pivoting approach. We evaluate our best MSA-pivoting MT pipeline against three direct SMT baselines trained on these three parallel corpora: DA-English data only, MSA-English data only, and the combination of DA-English and MSA-English data. Furthermore, we leverage the use of these four MT systems (the three baselines along with our MSA-pivoting system) in two system combi- nation approaches that benefit from their strengths while avoiding their weaknesses. Finally, we propose an approach to model dialects from monolingual data and limited DA-English parallel data without the need for any language-dependent preprocessing tools. We learn DA preprocessing rules using word embedding and expectation maximization. We test this approach by building a morphological segmentation system and we evaluate its performance on MT against the state-of-the-art dialectal tokenization tool. Contents List of Figures vii List of Tables xi 1 Introduction 1 1.1 Introduction to Machine Translation . 2 1.2 Introduction to Arabic and its Challenges for NLP . 4 1.2.1 Arabic as a Prototypical Diglossic Language . 4 1.2.2 Modern Standard Arabic Challenges . 5 1.2.3 Dialectal Arabic Challenges . 5 1.2.4 Dialectness, Domain, Genre, and Timespan . 7 1.2.5 Overview and Challenges of Dialect-Foreign Parallel Data . 7 1.3 Contributions . 8 1.3.1 Thesis Contributions . 9 1.3.2 Released Tools . 19 1.3.3 Note on Data Sets . 19 1.4 Thesis Outline . 20 2 Related Work 21 2.1 Introduction to Machine Translation . 21 2.1.1 Rule-Based versus Statistical Machine Translation . 21 i 2.1.2 Neural Machine Translation (NMT) . 22 2.2 Introduction to Arabic and its Challenges for NLP . 23 2.2.1 A History Review of Arabic and its Dialects . 23 2.2.2 Arabic as a Prototypical Diglossic Language . 26 2.2.3 Modern Standard Arabic Challenges . 26 2.2.4 Dialectal Arabic Challenges . 28 2.2.5 Dialectness, Domain, Genre, and Timespan . 29 2.2.6 Overview and Challenges of Dialect-Foreign Parallel Data . 30 2.3 Dialectal Arabic Natural Language Processing . 31 2.3.1 Extending Modern Standard Arabic Resources . 31 2.3.2 Dialectal Arabic Morphological Analysis . 32 2.3.3 Dialect Identification . 34 2.4 Machine Translation of Dialects . 34 2.4.1 Machine Translation for Closely Related Languages. 35 2.4.2 DA-to-English Machine Translation . 35 2.5 Machine Translation System Combination . 37 2.6 Morphological Segmentation . 38 2.6.1 Supervised Learning Approaches to Morphological Segmentation . 38 2.6.2 Unsupervised Learning Approaches to Morphological Segmentation 38 I Translating Dialects with No Dialectal Resources 41 3 Analyzer for Dialectal Arabic Morphology (ADAM) 43 3.1 Introduction . 43 3.2 Motivation . 44 3.3 Approach . 45 3.3.1 Databases . 45 ii 3.3.2 SADA Rules . 46 3.4 Intrinsic Evaluation . 49 3.4.1 Evaluation of Coverage . 50 3.4.2 Evaluation of In-context Part-of-Speech Recall . 51 3.5 Extrinsic Evaluation . 52 3.5.1 Experimental Setup . 53 3.5.2 The Dev and Test Sets . 53 3.5.3 Machine Translation Results . 54 3.6 Conclusion and Future Work . 55 4 Pivoting with Rule-Based DA-to-MSA Machine Translation System (ELISSA) 59 4.1 Introduction . 59 4.2 Motivation . 60 4.3 The ELISSA Approach . 62 4.4 Selection . 64 4.4.1 Word-based selection . 64 4.4.2 Phrase-based selection . 65 4.5 Translation . 66 4.5.1 Word-based translation . 66 4.5.2 Phrase-based translation . 68 4.6 Language Modeling . 70 4.7 Intrinsic Evaluation: DA-to-MSA Translation Quality . 71 4.7.1 Revisiting our Motivating Example . 71 4.7.2 Manual Error Analysis . 72 4.8 Extrinsic Evaluation: DA-English MT . 73 4.8.1 The MSA-Pivoting Approach . 73 4.8.2 Experimental Setup . 74 iii 4.8.3 Machine Translation Results . 76 4.8.4 A Case Study . 78 4.9 Conclusion and Future Work . 79 II Translating Dialects with Dialectal Resources 81 5 Pivoting with Statistical and Hybrid DA-to-MSA Machine Translation 83 5.1 Introduction . 83 5.2 Dialectal Data and Preprocessing Tools . 84 5.3 Synthesizing Parallel Corpora . 84 5.4 The MSA-Pivoting Approach . 85 5.4.1 Improving MSA-Pivoting with Rule-Based ELISSA . 88 5.4.2 MSA-Pivoting with Statistical DA-to-MSA MT . 89 5.4.3 MSA-Pivoting with Hybrid DA-to-MSA MT . 89 5.5 Evaluation . 90 5.5.1 Experimental Setup . 90 5.5.2 Experiments . 91 5.6 Conclusion and Future Work . 93 6 System Combination 95 6.1 Introduction . 95 6.2 Related Work . 96 6.3 Baseline Experiments and Motivation . 96 6.3.1 Experimental Settings . 97 6.3.2 Baseline MT Systems . 99 6.3.3 Oracle System Combination . 100 6.4 Machine Translation System Combination . 100 iv 6.4.1 Dialect ID Binary Classification . 102 6.4.2 Feature-based Four-Class Classification . 102 6.4.3 System Combination Evaluation . 104 6.5 Discussion of Dev Set Subsets . 106 6.5.1 DA versus MSA Performance . 107 6.5.2 Analysis of Different Dialects . 108 6.6 Error Analysis . 110 6.6.1 Manual Error Analysis . 110 6.6.2 Example . 111 6.7 Conclusion and Future Work . 111 III Scaling to More Dialects 113 7 Unsupervised Morphological Segmentation for Machine Translation 115 7.1 Introduction . 115 7.2 Related Work . 116 7.2.1 Supervised Learning Approaches to Morphological Segmentation . 117 7.2.2 Unsupervised Learning Approaches to Morphological Segmentation 117 7.3 Approach . 118 7.4 Monolingual Identification of Segmentation Rules . 122 7.4.1 Clustering based on Word Embeddings . 122 7.4.2 Rule Extraction and Expansion . 123 7.4.3 Learning Rule Scores . 127 7.4.4 Experiments . 129 7.5 Alignment Guided Segmentation Choice . 130 7.5.1 Approach . 130 7.5.2 The Alignment Model . ..
Recommended publications
  • Phonological Variation and Change in Mesopotamiaː
    Phonological variation and change in Mesopotamiaː A study of accent levelling in the Arabic dialect of Mosul. Abdulkareem Yaseen Ahmed Thesis submitted to the Faculty of Humanities, Arts and Social Sciences in accordance with the requirements for the degree of Doctor of Philosophy (Linguistics) School of Education, Communication and Language Sciences Newcastle University October 2018 Dedication To My Heart, soul & life Hussein, Yaseen & Yousif Acknowledgements Firstly, I would like to express my sincere gratitude to my supervisors Dr Ghada Khattab and Dr Damien Hall for their continuous support of my PhD study and related research, for their patience, honesty and immense knowledge. Their guidance over the last few years helped me in all the time of research and writing of this thesis. I would like to thank the following people for their kind support and help throughout my study: Dr Jalal Al-Tamimi and Dr Danielle Turton for their very helpful comments and suggestions on various things of the study. I would also like to thank Daniel Ezra Johnson for his support in conducting the statistics in this study. My sincere thanks to my colleague Maha Jasim who helped in many things especially checking the segmentation of the data. Very special ‘Merci’ goes to Maelle Amand for her immense help. I would also to thank all the people of Mosul and others who helped in various capacities in this study, particularly Ahmed Salama, Khalid Ibrahim Alahmed and Ekhlas Muhsin and Dhiaa Kareem. An everlasting ‘Thank You’ goes to Rosalie Maggio, Janet Atwill and Annabelle Lukin. I would also like to acknowledge the support of HCED (Iraq) for sponsoring my studies, without which this work would not have been possible.
    [Show full text]
  • “The Overuse of Italian Loanwords in the Daily Speech of Tripoli University Students: the Impact of Gender and Residential Place”
    العدد العشرون تاريخ اﻹصدار: 2 – ُحزيران – 2020 م ISSN: 2663-5798 www.ajsp.net “The Overuse of Italian Loanwords in the Daily Speech of Tripoli University Students: the Impact of Gender and Residential Place” Prepared by Jalal Al Dain Y. Abidah, English Dept., Faculty of Education - Janzour, Tripoli University 19 Arab Journal for Scientific Publishing (AJSP) ISSN: 2663-5798 العدد العشرون تاريخ اﻹصدار: 2 – ُحزيران – 2020 م ISSN: 2663-5798 www.ajsp.net Abstract The study tries to explore the impact of social factors of gender and residential place on the use of Italian loanwords by Libyan university students (using Tripoli University as an example) and how the mentioned social factors affect their daily speech. To answer the questions of the study, a sample of 60 Tripoli university students are selected randomly in the campus (A) of University of Tripoli. They were divided into two groups according to their Gender and residential place. In order to collect data, a questionnaire was developed for this purpose. It generated data regarding the use of 150 Italian loanwords by both groups. The mean of using Italian loanwords in both groups was analyzed and computed using SPSS. However, the study reveals the impact of residential area where Italian loanwords were more incorporated by rural students than urbanites. The results of the study revealed that there was a significant statistical difference at (α≤0.05) among the means of both groups regarding the use of Italian loanwords in daily speech due to residential area. In contrasts, gender emerges as insignificant. Keyword Italian loanwords, Colloquial Arabic, Gender, Libya.
    [Show full text]
  • Christians and Jews in Muslim Societies
    Arabic and its Alternatives Christians and Jews in Muslim Societies Editorial Board Phillip Ackerman-Lieberman (Vanderbilt University, Nashville, USA) Bernard Heyberger (EHESS, Paris, France) VOLUME 5 The titles published in this series are listed at brill.com/cjms Arabic and its Alternatives Religious Minorities and Their Languages in the Emerging Nation States of the Middle East (1920–1950) Edited by Heleen Murre-van den Berg Karène Sanchez Summerer Tijmen C. Baarda LEIDEN | BOSTON Cover illustration: Assyrian School of Mosul, 1920s–1930s; courtesy Dr. Robin Beth Shamuel, Iraq. This is an open access title distributed under the terms of the CC BY-NC 4.0 license, which permits any non-commercial use, distribution, and reproduction in any medium, provided no alterations are made and the original author(s) and source are credited. Further information and the complete license text can be found at https://creativecommons.org/licenses/by-nc/4.0/ The terms of the CC license apply only to the original material. The use of material from other sources (indicated by a reference) such as diagrams, illustrations, photos and text samples may require further permission from the respective copyright holder. Library of Congress Cataloging-in-Publication Data Names: Murre-van den Berg, H. L. (Hendrika Lena), 1964– illustrator. | Sanchez-Summerer, Karene, editor. | Baarda, Tijmen C., editor. Title: Arabic and its alternatives : religious minorities and their languages in the emerging nation states of the Middle East (1920–1950) / edited by Heleen Murre-van den Berg, Karène Sanchez, Tijmen C. Baarda. Description: Leiden ; Boston : Brill, 2020. | Series: Christians and Jews in Muslim societies, 2212–5523 ; vol.
    [Show full text]
  • Some Principles of the Use of Macro-Areas Language Dynamics &A
    Online Appendix for Harald Hammarstr¨om& Mark Donohue (2014) Some Principles of the Use of Macro-Areas Language Dynamics & Change Harald Hammarstr¨om& Mark Donohue The following document lists the languages of the world and their as- signment to the macro-areas described in the main body of the paper as well as the WALS macro-area for languages featured in the WALS 2005 edi- tion. 7160 languages are included, which represent all languages for which we had coordinates available1. Every language is given with its ISO-639-3 code (if it has one) for proper identification. The mapping between WALS languages and ISO-codes was done by using the mapping downloadable from the 2011 online WALS edition2 (because a number of errors in the mapping were corrected for the 2011 edition). 38 WALS languages are not given an ISO-code in the 2011 mapping, 36 of these have been assigned their appropri- ate iso-code based on the sources the WALS lists for the respective language. This was not possible for Tasmanian (WALS-code: tsm) because the WALS mixes data from very different Tasmanian languages and for Kualan (WALS- code: kua) because no source is given. 17 WALS-languages were assigned ISO-codes which have subsequently been retired { these have been assigned their appropriate updated ISO-code. In many cases, a WALS-language is mapped to several ISO-codes. As this has no bearing for the assignment to macro-areas, multiple mappings have been retained. 1There are another couple of hundred languages which are attested but for which our database currently lacks coordinates.
    [Show full text]
  • The Historical Development of the Maltese Plural Suffixes -Iet and -(I)Jiet Having Spent the Last Millennium in Close Contact Wi
    The historical development of the Maltese plural suffixes -iet and -(i)jiet Having spent the last millennium in close contact with several Indo-European languages, Maltese, the modern descendant of Siculo-Arabic (Brincat 2011), possesses various pluralization strategies. In this paper I explore the historical development of two such strategies of Semitic- origin: the suffixes -iet and -(i)jiet (e.g. papa ‘pope’, pl. papiet; omm ‘mother’, pl. ommijiet). Specialists agree that both suffixes originated from the Arabic plural suffix -āt (Borg 1976; Mifsud 1995); however, no research has explained the development of -(i)jiet, nor connected its development to that of -iet. I argue that the development of -(i)jiet was driven by the influx of i- final words which resulted from contact with Italian: Maltese speakers affixed -iet to such words, triggering a glide-epenthesis that occurs elsewhere in Maltese (e.g. Mifsud 1996: 34) and in other varieties of Arabic (e.g. Erwin 1963; Cowell 1964; Owens 1984; Harrell 2004). With a large number of plurals now ending in ijiet, speakers reanalyzed this sequence as a unique plural suffix and began applying it to new non-i-final words as well. Since only Maltese experienced this influx of i-final words, it was only in Maltese that speakers reanalyzed this sequence as a separate suffix. Additionally, I argue against an explanation of the development of -(i)jiet that does not rely on epenthesis. With regard to the suffix -iet, I argue that two properties unique to it – the obligatory omission of stem-final vowels upon pluralization, and the near-universal tendency to pluralize only a-final words – emerged from a separate reanalysis of the pluralization of collective nouns that reflects a general weakening of the Semitic element in Maltese under Indo-European contact.
    [Show full text]
  • Different Dialects of Arabic Language
    e-ISSN : 2347 - 9671, p- ISSN : 2349 - 0187 EPRA International Journal of Economic and Business Review Vol - 3, Issue- 9, September 2015 Inno Space (SJIF) Impact Factor : 4.618(Morocco) ISI Impact Factor : 1.259 (Dubai, UAE) DIFFERENT DIALECTS OF ARABIC LANGUAGE ABSTRACT ifferent dialects of Arabic language have been an Dattraction of students of linguistics. Many studies have 1 Ali Akbar.P been done in this regard. Arabic language is one of the fastest growing languages in the world. It is the mother tongue of 420 million in people 1 Research scholar, across the world. And it is the official language of 23 countries spread Department of Arabic, over Asia and Africa. Arabic has gained the status of world languages Farook College, recognized by the UN. The economic significance of the region where Calicut, Kerala, Arabic is being spoken makes the language more acceptable in the India world political and economical arena. The geopolitical significance of the region and its language cannot be ignored by the economic super powers and political stakeholders. KEY WORDS: Arabic, Dialect, Moroccan, Egyptian, Gulf, Kabael, world economy, super powers INTRODUCTION DISCUSSION The importance of Arabic language has been Within the non-Gulf Arabic varieties, the largest multiplied with the emergence of globalization process in difference is between the non-Egyptian North African the nineties of the last century thank to the oil reservoirs dialects and the others. Moroccan Arabic in particular is in the region, because petrol plays an important role in nearly incomprehensible to Arabic speakers east of Algeria. propelling world economy and politics.
    [Show full text]
  • Arabic Language and Linguistics Certificate Revised: 05/2018
    Arabic Language and Linguistics Certificate www.Linguistics.Pitt.edu Revised: 05/2018 Overview This certificate offers undergraduates a way to become highly proficient in the Arabic language and linguistic structure and to develop an understanding of important issues in Arabic life and culture. The certificate will conveniently accompany any undergraduate major in the Dietrich School and in other schools at the University of Pittsburgh. The program draws on the academic strengths and resources of the Department of Linguistics and on faculty from the School of Education. Enrollment in this program is limited to 20 students per academic year. Therefore, students must complete an evaluation process and be accepted into the program before declaring this certificate. For information, please contact the Amani Attia, the Arabic Coordinator, at [email protected]. Requirements Category 3: Culture of the Arabic-Speaking This certificate requires completion of 22-23 credits, World detailed as follows. Credit requirements do not include Choose one of the following courses. prerequisite courses. ARABIC 1615 Arabic Life and Thought ARABIC 1635 Introduction to Modern Arabic Literature Prerequisite courses Choose one dialect pair. Category 4: Electives ARABIC 0101 MSA Egyptian 1 Choose one of the following courses. Dialect courses ARABIC 0102 MSA Egyptian 2 must follow previous coursework. ARABIC 0105 MSA Egyptian 5 ARABIC 0121 MSA Levantine 1 ARABIC 0106 MSA Egyptian 6 ARABIC 0122 MSA Levantine 2 ARABIC 0125 MSA Levantine 5 ARABIC 0126 MSA Levantine 6 Category 1: Language instruction ARABIC 0211 Iraqi Arabic 1 Chose the same dialect pair as the prerequisite ARABIC 1615 Arabic Life and Thought language courses.
    [Show full text]
  • Kiraz 2019 a Functional Approach to Garshunography
    Intellectual History of the Islamicate World 7 (2019) 264–277 brill.com/ihiw A Functional Approach to Garshunography A Case Study of Syro-X and X-Syriac Writing Systems George A. Kiraz Institute for Advanced Study, Princeton and Beth Mardutho: The Syriac Institute, Piscataway [email protected] Abstract It is argued here that functionalism lies at the heart of garshunographic writing systems (where one language is written in a script that is sociolinguistically associated with another language). Giving historical accounts of such systems that began as early as the eighth century, it will be demonstrated that garshunographic systems grew organ- ically because of necessity and that they offered a certain degree of simplicity rather than complexity.While the paper discusses mostly Syriac-based systems, its arguments can probably be expanded to other garshunographic systems. Keywords Garshuni – garshunography – allography – writing systems It has long been suggested that cultural identity may have been the cause for the emergence of Garshuni systems. (In the strictest sense of the term, ‘Garshuni’ refers to Arabic texts written in the Syriac script but the term’s semantics were drastically extended to other systems, sometimes ones that have little to do with Syriac—for which see below.) This paper argues for an alterna- tive origin, one that is rooted in functional theory. At its most fundamental level, Garshuni—as a system—is nothing but a tool and as such it ought to be understood with respect to the function it performs. To achieve this, one must take into consideration the social contexts—plural, as there are many—under which each Garshuni system appeared.
    [Show full text]
  • ARAB - Arabic (ARAB) 1
    ARAB - Arabic (ARAB) 1 ARAB 301 Reading and Composition ARAB - ARABIC (ARAB) Credits 3. 3 Lecture Hours. Advanced Arabic grammar and readings of average difficulty and of ARAB 101 Beginning Arabic I different genres, including literary and journalistic texts and other Credits 4. 4 Lecture Hours. culturally-enriched materials in order to develop awareness of cultural (ARAB 1411) Beginning Arabic I. Introduction to Modern Standard Arabic products, perspectives, and practices found in the Arab world. in its written and spoken forms; emphasis on conversation, rudimentary Prerequisites: ARAB 202 or ARAB 204, or equivalent; junior or senior vocabulary, simple grammar, and reading. classification or approval of instructor. ARAB 102 Beginning Arabic II ARAB 302 Reading and Composition II Credits 4. 4 Lecture Hours. Credits 3. 3 Lecture Hours. (ARAB 1412) Beginning Arabic II. Introduction of more complex Readings of average difficulty and of different genres, including grammatical constructions; vocabulary building; emphasis on putting literary and journalistic texts and other culturally-enriched materials; acquired vocabulary and grammar to conversational use. development of writing skills with emphasis on grammatical Prerequisite: ARAB 101 or equivalent. constructions; expansion of vocabulary and oral expression. ARAB 104 Intensive Beginning Arabic Prerequisites: ARAB 301; junior or senior classification or approval of Credits 8. 8 Lecture Hours. instructor. Accelerated elementary language study, with oral, listening, reading and ARAB 321 Business Arabic writing practice. Equivalent to ARAB 101 and ARAB 102. Credits 3. 3 Lecture Hours. ARAB 201 Intermediate Arabic I Business and financial terminologies useful in the Arab World; cultural Credits 3. 3 Lecture Hours. etiquette for effective communication in Arabic business settings; (ARAB 2311) Intermediate Arabic I.
    [Show full text]
  • Arabic Sociolinguistics: Topics in Diglossia, Gender, Identity, And
    Arabic Sociolinguistics Arabic Sociolinguistics Reem Bassiouney Edinburgh University Press © Reem Bassiouney, 2009 Edinburgh University Press Ltd 22 George Square, Edinburgh Typeset in ll/13pt Ehrhardt by Servis Filmsetting Ltd, Stockport, Cheshire, and printed and bound in Great Britain by CPI Antony Rowe, Chippenham and East bourne A CIP record for this book is available from the British Library ISBN 978 0 7486 2373 0 (hardback) ISBN 978 0 7486 2374 7 (paperback) The right ofReem Bassiouney to be identified as author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988. Contents Acknowledgements viii List of charts, maps and tables x List of abbreviations xii Conventions used in this book xiv Introduction 1 1. Diglossia and dialect groups in the Arab world 9 1.1 Diglossia 10 1.1.1 Anoverviewofthestudyofdiglossia 10 1.1.2 Theories that explain diglossia in terms oflevels 14 1.1.3 The idea ofEducated Spoken Arabic 16 1.2 Dialects/varieties in the Arab world 18 1.2. 1 The concept ofprestige as different from that ofstandard 18 1.2.2 Groups ofdialects in the Arab world 19 1.3 Conclusion 26 2. Code-switching 28 2.1 Introduction 29 2.2 Problem of terminology: code-switching and code-mixing 30 2.3 Code-switching and diglossia 31 2.4 The study of constraints on code-switching in relation to the Arab world 31 2.4. 1 Structural constraints on classic code-switching 31 2.4.2 Structural constraints on diglossic switching 42 2.5 Motivations for code-switching 59 2.
    [Show full text]
  • Classic Poetry of Arab and Persian
    European Journal of Scientific Research ISSN 1450-216X / 1450-202X Vol. 139 No 3 May, 2016, pp.257-262 http://www.europeanjournalofscientificresearch.com Comparative Study on Bahariye in Neo –Classic Poetry of Arab and Persian Mohammad Shaygan Mehr Department of Arabic Language and Literature Kashmar Branch, Islamic Azad university, Kashmar, Iran Ali asghar Mansouri .Department of Arabic Language and Literature Kashmar Branch, Islamic Azad university, Kashmar, Iran Nabialehrajani Department of Arabic Language and Literature Kashmar Branch, Islamic Azad university, Kashmar, Iran Hassan Ghamari Department of Arabic Language and Literature Kashmar Branch, Islamic Azad University, Kashmar, Iran Abstract As we can see the subject of the study has been not studied and researched in the previous works, this study tries to provide regular collection of scattered material to overcome the shortcomings of the issue. The aim of this paper is to review and correct lexical definitions in both Arabic and Persian words of Bahariyeh, and also studies the similarities and differences of Bahariyeh in Persian and Arabic classical new poetry. Bahariyeh is one of the common themes in Persian literature. Also in the literature of Arab it has been composed some poems on the theme of spring as Robyyat. In both contemporary periods, because of familiarity of poets with European literature in one hand and social issues, philanthropy and patriotism remember the other hand, the themes and contents of Bahariyeh, had found significant differences compared to the previous periods. In this study the similarities and differences of Bahariye, in these two languages, will be examined in the term of structure and content.
    [Show full text]
  • Possessive Constructions in Najdi Arabic
    Possessive Constructions in Najdi Arabic Eisa Sneitan Alrasheedi A thesis submitted to the Faculty of Humanities, Arts and Social Sciences in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Theoretical Linguistics School of English Literature, Language and Linguistics Newcastle University July, 2019 ii Abstract This thesis investigates the syntax of possession and agreement in Najdi Arabic (NA, henceforth) with a particular focus on the possession expressed at the level of the DP (Determiner Phrase). Using the main assumptions of the Minimalist Program (Chomsky 1995, and subsequent work) and adopting Abney’s (1987) DP-hypothesis, this thesis shows that the various agreement patterns within the NA DP can be accounted for with the use of a probe/goal agreement operation (Chomsky 2000, 2001). Chapter two discusses the syntax of ‘synthetic’ possession in NA. Possession in NA, like other Arabic varieties, can be expressed synthetically using a Construct State (CS), e.g. kitaab al- walad (book the-boy) ‘the boy’s book’. Drawing on the (extensive) literature on the CS, I summarise its main characteristics and the different proposals for its derivation. However, the main focus of this chapter is on a lesser-investigated aspect of synthetic possession – that is, possessive suffixes, the so-called pronominal possessors, as in kitaab-ah (book-his) ‘his book’. Building on a previous analysis put forward by Shlonsky (1997), this study argues (contra Fassi Fehri 1993), that possessive suffixes should not be analysed as bound pronouns but rather as an agreement inflectional suffix (à la Shlonsky 1997), where the latter is derived by Agree between the Poss(essive) head and the null pronoun within NP.
    [Show full text]