Language Modelling with NMT Query Translation for Amharic-Arabic

Total Page:16

File Type:pdf, Size:1020Kb

Language Modelling with NMT Query Translation for Amharic-Arabic Language Modeling with NMT Query Translation for Amharic-Arabic Cross-Language Information Retrieval Ibrahim Gashaw H L Shashirekha Mangalore University Mangalore University Mangalagangotri, Mangalore-574199 Mangalagangotri, Mangalore-574199 [email protected] [email protected] Abstract information in different languages as per their information need (Sourabh, 2013). This paper describes our first experiment on Neural Machine Translation (NMT) The Amharic language is the official lan- based query translation for Amharic- guage of Ethiopia spoken by 26.9% of Arabic Cross-Language Information Re- Ethiopia’s population as mother tongue and trieval (CLIR) task to retrieve relevant spoken by many people in Israel, Egypt, and documents from Amharic and Arabic text Sweden. Arabic is a natural language spoken collections in response to a query expressed by 250 million people in 21 countries as the in the Amharic language. We used a pre- first language and serving as a second lan- trained NMT model to map a query in the source language into an equivalent query guage in some Islamic countries. Ethiopia in the target language. The relevant docu- is one of the nations, which have more than ments are then retrieved using a Language 33.3% of the population who follow Islam, and Modeling (LM) based retrieval algorithm. they use the Arabic language to teach religion Experiments are conducted on four con- and for communication purposes. Arabic and ventional IR models, namely Uni-gram and Amharic languages belong to the Semitic fam- Bi-gram LM, Probabilistic model, and Vec- ily of languages, where the words in such lan- tor Space Model (VSM). The results ob- tained illustrate that the proposed Uni- guages are formed by modifying the root itself gram LM outperforms all other models for internally and not simply by the concatena- both Amharic and Arabic language docu- tion of affixes to word roots (Shashirekha and ment collections. Gashaw, 2016). Nowadays, it is widely used to solve CLIR 1 Introduction problems for many language pairs. However, Information Retrieval (IR) is the activity much of the research on this area has fo- of retrieving relevant documents to informa- cused on European languages despite these tion seekers from a collection of informa- languages being very rich in resources. So tion resources such as text, images, videos, this study is aimed to develop the NMT query scanned documents, audio, and music as well. translation based Amharic-Arabic CLIR sys- These resources can be structured, indexed, tem. and navigated through Language Technology An essential part of CLIR is mapping be- (LT), which includes computational methods tween query and document collections by that are specialized for analyzing, producing, translating queries to the target document lan- modifying, and translating text and speech guage or the source document to the target (Madankar et al., 2016) . The increasing ne- document language. We follow the first ap- cessity for retrieval of multilingual documents proach to translate the query words by us- in response to a query in any language opens ing a pre-trained NMT model. For the pur- up a new branch of IR called Cross-Language pose of this translation, we have constructed Information Retrieval (CLIR). Its goal is to a small parallel text corpus by modifying the accept the query in one language, transform existing monolingual Arabic and its equiva- it into a searchable format and provide an in- lent translation of Amharic language text cor- terface to allow a user to search and retrieve pora available on Tanzile (Tiedemann, 2012), 56 D M Sharma, P Bhattacharyya and R Sangal. Proc. of the 16th Intl. Conference on Natural Language Processing, pages 56–64 Hyderabad, India, December 2019. ©2019 NLP Association of India (NLPAI) as Amharic-Arabic parallel text corpora are difference between different meanings of am- not available for MT task. biguous terms according to their contexts of The rest of the paper is organized as fol- utilization (Nie, 2010). lows. CLIR approaches are discussed in sec- tion 2. Related works are reviewed in Section 2.3 Machine Translation approach 3. The proposed CLIR approach based on LM MT is a process of obtaining a target language is described in Section 4. Resources and con- text for a given source language text by us- figurations of experiments for evaluating the ing automatic techniques. MT can be used system and the results are detailed in Section to translate the query, the document, or both 5, followed by a conclusion in section 6. into the same language, and the retrieval pro- cess could then be treated similar to a con- 2 CLIR Approaches ventional IR system. However, MT systems require time and resources to develop and are In CLIR, the query and the document col- still not widely or readily available for many lection needs to be mapped into a common language pairs (Madankar et al., 2016) . representation to enable users to search and retrieve relevant documents across the lan- 2.4 Probabilistic-based approaches guage boundaries (Tune, 2015). Based on Probabilistic-based approaches include the resources used to map the query and the corpus-based methods which translate queries documents in different languages, CLIR ap- and language modeling which avoid transla- proaches can be categorized as; Dictionary- tion of queries. based approach, Latent Semantic Indexing (LSI), Machine Translation (MT) approach, 2.4.1 Corpus-based methods and Probabilistic-based approach (Raju et al., Corpus-Based approaches use multilingual 2014). corpora which can be parallel corpora or com- 2.1 Dictionary-based approaches parable corpora. In this approach, queries are translated on the basis of multilingual terms Dictionary-based approaches use either an extracted from parallel or comparable docu- automatically constructed bilingual Machine ment collections. While parallel corpora con- Readable Dictionaries (MRD), bilingual word tain translation-equivalent texts which contain lists, or other lexicon resources to translate the direct translations of the same documents in query terms to their target language equiva- different languages, comparable corpora con- lents. This approach offers a relatively cheap tain texts of the same subject which are nei- and easily applicable solution for large-scale ther aligned nor direct translations of each document collection. Due to Out of Vocab- other but composed in their respective lan- ulary (OOV), some words in a query may guages independently (Tesfaye, 2010). It is not be translated. Further, linguistic con- available only in a few languages and more ex- cepts such as polysemy and homonymy may pensive to construct. introduce ambiguity in translation of words (Shashirekha and Gashaw, 2016) 2.4.2 Language modeling approaches A language model is a probability distribution 2.2 LSI approach over all possible sentences or other linguistic In the LSI approach, the documents of units in a language. While the classification of the source language are represented in the LM is not exhaustive, and a specific language language-independent LSI space. Similarly, model may belong to several types, LM can be a user query can be treated as a pseudo- categorized as uniform, finite state, grammar- document and represented as a vector in the based, n-gram, and Neural Language Model same LSI space. Even though the performance (NLM) (or continuous space LM) that might of the LSI model is on par with the tradi- be feed-forward or recurrent (SWLG, 1997) . tional vector space model, the cost of comput- Uniform LM uses the same probability for all ing Singular Value Decomposition (SVD) of words of the vocabulary of the sentences if very large collections is high, and it makes a the number of sentences is limited. In finite- 57 state LM, the set of legal word sequences is in the dictionary. The lack of electronic re- represented as a finite state network (or regu- sources such as morphological analyzers and lar grammar) whose edges stand for the words large MRD have forced A. Argaw (2005) to that are assigned probabilities. Grammar- spend considerable time to develop those re- based LM is based on variants of stochastic sources themselves. context-free grammars or other phrase struc- Solving the problem of word sense dis- ture grammars. ambiguation will enhance the effectiveness Data scarcity is a significant problem in of CLIR systems. Andres Duque et al. building language models, as most possible (2015), studied to choose the best dictionary word sequences will not be observed in train- for Cross-Lingual Word Sense Disambiguation ing. One solution to this problem is contin- (CLWSD), which is focused only on English- uous representations, or embedding of words Spanish cross-lingual disambiguation and the to make their predictions that help to alle- disambiguation task is dependent on the cov- viate the curse of dimensionality in LM. The erage of dictionary and corpus size. Query main advantage of LM is to estimate the dis- suggestion that exploits query logs and doc- tribution of various natural language phenom- ument collections by mapping the input query ena for language technologies such as speech, of French language to queries of English lan- machine translation, document classification guage in the query log of a search engine by and routing, optical character recognition, in- W. Gao et al. (2007) showed the strong cor- formation retrieval, handwriting recognition, respondence between the French input queries spelling correction, etc. (Kim et al., 2016) . and English queries in the log, but languages Over-fitting (random error or noise instead of may be more loosely correlated. For exam- the underlying relationship when its test error ple, English and Amharic. M.Al-shuaili and is larger than its training error) is the main M.Garvalho (2016), proposed a technique to limitation in current LM for small size datasets map characters automatically from different (Jozefowicz et al., 2016) . languages into English, without human inter- ference and prior knowledge of the language.
Recommended publications
  • Arabic Language and Linguistics Certificate Revised: 05/2018
    Arabic Language and Linguistics Certificate www.Linguistics.Pitt.edu Revised: 05/2018 Overview This certificate offers undergraduates a way to become highly proficient in the Arabic language and linguistic structure and to develop an understanding of important issues in Arabic life and culture. The certificate will conveniently accompany any undergraduate major in the Dietrich School and in other schools at the University of Pittsburgh. The program draws on the academic strengths and resources of the Department of Linguistics and on faculty from the School of Education. Enrollment in this program is limited to 20 students per academic year. Therefore, students must complete an evaluation process and be accepted into the program before declaring this certificate. For information, please contact the Amani Attia, the Arabic Coordinator, at [email protected]. Requirements Category 3: Culture of the Arabic-Speaking This certificate requires completion of 22-23 credits, World detailed as follows. Credit requirements do not include Choose one of the following courses. prerequisite courses. ARABIC 1615 Arabic Life and Thought ARABIC 1635 Introduction to Modern Arabic Literature Prerequisite courses Choose one dialect pair. Category 4: Electives ARABIC 0101 MSA Egyptian 1 Choose one of the following courses. Dialect courses ARABIC 0102 MSA Egyptian 2 must follow previous coursework. ARABIC 0105 MSA Egyptian 5 ARABIC 0121 MSA Levantine 1 ARABIC 0106 MSA Egyptian 6 ARABIC 0122 MSA Levantine 2 ARABIC 0125 MSA Levantine 5 ARABIC 0126 MSA Levantine 6 Category 1: Language instruction ARABIC 0211 Iraqi Arabic 1 Chose the same dialect pair as the prerequisite ARABIC 1615 Arabic Life and Thought language courses.
    [Show full text]
  • Kiraz 2019 a Functional Approach to Garshunography
    Intellectual History of the Islamicate World 7 (2019) 264–277 brill.com/ihiw A Functional Approach to Garshunography A Case Study of Syro-X and X-Syriac Writing Systems George A. Kiraz Institute for Advanced Study, Princeton and Beth Mardutho: The Syriac Institute, Piscataway [email protected] Abstract It is argued here that functionalism lies at the heart of garshunographic writing systems (where one language is written in a script that is sociolinguistically associated with another language). Giving historical accounts of such systems that began as early as the eighth century, it will be demonstrated that garshunographic systems grew organ- ically because of necessity and that they offered a certain degree of simplicity rather than complexity.While the paper discusses mostly Syriac-based systems, its arguments can probably be expanded to other garshunographic systems. Keywords Garshuni – garshunography – allography – writing systems It has long been suggested that cultural identity may have been the cause for the emergence of Garshuni systems. (In the strictest sense of the term, ‘Garshuni’ refers to Arabic texts written in the Syriac script but the term’s semantics were drastically extended to other systems, sometimes ones that have little to do with Syriac—for which see below.) This paper argues for an alterna- tive origin, one that is rooted in functional theory. At its most fundamental level, Garshuni—as a system—is nothing but a tool and as such it ought to be understood with respect to the function it performs. To achieve this, one must take into consideration the social contexts—plural, as there are many—under which each Garshuni system appeared.
    [Show full text]
  • ARAB - Arabic (ARAB) 1
    ARAB - Arabic (ARAB) 1 ARAB 301 Reading and Composition ARAB - ARABIC (ARAB) Credits 3. 3 Lecture Hours. Advanced Arabic grammar and readings of average difficulty and of ARAB 101 Beginning Arabic I different genres, including literary and journalistic texts and other Credits 4. 4 Lecture Hours. culturally-enriched materials in order to develop awareness of cultural (ARAB 1411) Beginning Arabic I. Introduction to Modern Standard Arabic products, perspectives, and practices found in the Arab world. in its written and spoken forms; emphasis on conversation, rudimentary Prerequisites: ARAB 202 or ARAB 204, or equivalent; junior or senior vocabulary, simple grammar, and reading. classification or approval of instructor. ARAB 102 Beginning Arabic II ARAB 302 Reading and Composition II Credits 4. 4 Lecture Hours. Credits 3. 3 Lecture Hours. (ARAB 1412) Beginning Arabic II. Introduction of more complex Readings of average difficulty and of different genres, including grammatical constructions; vocabulary building; emphasis on putting literary and journalistic texts and other culturally-enriched materials; acquired vocabulary and grammar to conversational use. development of writing skills with emphasis on grammatical Prerequisite: ARAB 101 or equivalent. constructions; expansion of vocabulary and oral expression. ARAB 104 Intensive Beginning Arabic Prerequisites: ARAB 301; junior or senior classification or approval of Credits 8. 8 Lecture Hours. instructor. Accelerated elementary language study, with oral, listening, reading and ARAB 321 Business Arabic writing practice. Equivalent to ARAB 101 and ARAB 102. Credits 3. 3 Lecture Hours. ARAB 201 Intermediate Arabic I Business and financial terminologies useful in the Arab World; cultural Credits 3. 3 Lecture Hours. etiquette for effective communication in Arabic business settings; (ARAB 2311) Intermediate Arabic I.
    [Show full text]
  • Saudi Dialects: Are They Endangered?
    Academic Research Publishing Group English Literature and Language Review ISSN(e): 2412-1703, ISSN(p): 2413-8827 Vol. 2, No. 12, pp: 131-141, 2016 URL: http://arpgweb.com/?ic=journal&journal=9&info=aims Saudi Dialects: Are They Endangered? Salih Alzahrani Taif University, Saudi Arabia Abstract: Krauss, among others, claims that languages will face death in the coming centuries (Krauss, 1992). Austin (2010a) lists 7,000 languages as existing and spoken in the world today. Krauss estimates that this figure could come down to 600. That is, most the world's languages are endangered. Therefore, an endangered language is a language that loses her speakers within a few generations. According to Dorian (1981), there is what is called ―tip‖ in language endangerment. He argues that a language's decline can start slowly but suddenly goes through a rapid decline towards the extinction. Thus, languages must be protected at much earlier stage. Arabic dialects such as Zahrani Spoken Arabic (ZSA), and Faifi Spoken Arabic (henceforth, FSA), which are spoken in the southern region of Saudi Arabia, have not been studied, yet. Few people speak these dialects, among many other dialects in the same region. However, the problem is that most these dialects' native speakers are moving to other regions in Saudi Arabia where they use other different dialects. Therefore, are these dialects endangered? What other factors may cause its endangerment? Have they been documented before? What shall we do? This paper discusses three main different points regarding this issue: language and endangerment, languages documentation and description and Arabic language and its family, giving a brief history of Saudi dialects comparing their situation with the whole existing dialects.
    [Show full text]
  • Classic Poetry of Arab and Persian
    European Journal of Scientific Research ISSN 1450-216X / 1450-202X Vol. 139 No 3 May, 2016, pp.257-262 http://www.europeanjournalofscientificresearch.com Comparative Study on Bahariye in Neo –Classic Poetry of Arab and Persian Mohammad Shaygan Mehr Department of Arabic Language and Literature Kashmar Branch, Islamic Azad university, Kashmar, Iran Ali asghar Mansouri .Department of Arabic Language and Literature Kashmar Branch, Islamic Azad university, Kashmar, Iran Nabialehrajani Department of Arabic Language and Literature Kashmar Branch, Islamic Azad university, Kashmar, Iran Hassan Ghamari Department of Arabic Language and Literature Kashmar Branch, Islamic Azad University, Kashmar, Iran Abstract As we can see the subject of the study has been not studied and researched in the previous works, this study tries to provide regular collection of scattered material to overcome the shortcomings of the issue. The aim of this paper is to review and correct lexical definitions in both Arabic and Persian words of Bahariyeh, and also studies the similarities and differences of Bahariyeh in Persian and Arabic classical new poetry. Bahariyeh is one of the common themes in Persian literature. Also in the literature of Arab it has been composed some poems on the theme of spring as Robyyat. In both contemporary periods, because of familiarity of poets with European literature in one hand and social issues, philanthropy and patriotism remember the other hand, the themes and contents of Bahariyeh, had found significant differences compared to the previous periods. In this study the similarities and differences of Bahariye, in these two languages, will be examined in the term of structure and content.
    [Show full text]
  • Contrastive Feature Typologies of Arabic Consonant Reflexes
    languages Article Contrastive Feature Typologies of Arabic Consonant Reflexes Islam Youssef Department of Languages and Literature Studies, University of South-Eastern Norway, 3833 Bø i Telemark, Norway; [email protected] Abstract: Attempts to classify spoken Arabic dialects based on distinct reflexes of consonant phonemes are known to employ a mixture of parameters, which often conflate linguistic and non- linguistic facts. This article advances an alternative, theory-informed perspective of segmental typology, one that takes phonological properties as the object of investigation. Under this approach, various classificatory systems are legitimate; and I utilize a typological scheme within the framework of feature geometry. A minimalist model designed to account for segment-internal representations produces neat typologies of the Arabic consonants that vary across dialects, namely qaf,¯ gˇ¯ım, kaf,¯ d. ad,¯ the interdentals, the rhotic, and the pharyngeals. Cognates for each of these are analyzed in a typology based on a few monovalent contrastive features. A key benefit of the proposed typologies is that the featural compositions of the various cognates give grounds for their behavior, in terms of contrasts and phonological activity, and potentially in diachronic processes as well. At a more general level, property-based typology is a promising line of research that helps us understand and categorize purely linguistic facts across languages or language varieties. Keywords: phonological typology; feature geometry; contrastivity; Arabic dialects; consonant reflexes Citation: Youssef, Islam. 2021. Contrastive Feature Typologies of 1. Introduction Arabic Consonant Reflexes. Languages Modern Arabic vernaculars have relatively large, but varying, consonant inventories. 6: 141. https://doi.org/10.3390/ Because of that, they have been typologized according to differences in the reflexes of their languages6030141 consonant phonemes—differences which suggest common origins or long-term contact (Watson 2011a, p.
    [Show full text]
  • Amharic-Arabic Neural Machine Translation
    AMHARIC-ARABIC NEURAL MACHINE TRANSLATION Ibrahim Gashaw and H L Shashirekha Mangalore University, Department of Computer Science, Mangalagangotri, Mangalore-574199 ABSTRACT Many automatic translation works have been addressed between major European language pairs, by taking advantage of large scale parallel corpora, but very few research works are conducted on the Amharic-Arabic language pair due to its parallel data scarcity. Two Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Translation (NMT) models are developed using Attention-based Encoder-Decoder architecture which is adapted from the open-source OpenNMT system. In order to perform the experiment, a small parallel Quranic text corpus is constructed by modifying the existing monolingual Arabic text and its equivalent translation of Amharic language text corpora available on Tanzile. LSTM and GRU based NMT models and Google Translation system are compared and found that LSTM based OpenNMT outperforms GRU based OpenNMT and Google Translation system, with a BLEU score of 12%, 11%, and 6% respectively. KEYWORDS Amharic, Arabic, Neural Machine Translation, OpenNMT 1. INTRODUCTION "Computational linguistics from a computational perspective is concerned with understanding written and spoken language, and building artifacts that usually process and produce language, either in bulk or in a dialogue setting." [1]. Machine Translation (MT), the task of translating texts from one natural language to another natural language automatically, is an important application of Computational Linguistics (CL) and Natural Language Processing (NLP). The overall process of invention, innovation, and diffusion of technology related to language translation drive the increasing rate of the MT industry rapidly [2]. The number of Language Service Provider (LSP) companies offering varying degrees of translation, interpretation, localization, language, and social coaching solutions are rising in accordance with the MT industry [2].
    [Show full text]
  • ENCYCLOPEDIA of HEBREW LANGUAGE and LINGUISTICS Volume 1 A–F
    ENCYCLOPEDIA OF HEBREW LANGUAGE AND LINGUISTICS Volume 1 A–F General Editor Geoffrey Khan Associate Editors Shmuel Bolokzy Steven E. Fassberg Gary A. Rendsburg Aaron D. Rubin Ora R. Schwarzwald Tamar Zewi LEIDEN • BOSTON 2013 © 2013 Koninklijke Brill NV ISBN 978-90-04-17642-3 Table of Contents Volume One Introduction ........................................................................................................................ vii List of Contributors ............................................................................................................ ix Transcription Tables ........................................................................................................... xiii Articles A-F ......................................................................................................................... 1 Volume Two Transcription Tables ........................................................................................................... vii Articles G-O ........................................................................................................................ 1 Volume Three Transcription Tables ........................................................................................................... vii Articles P-Z ......................................................................................................................... 1 Volume Four Transcription Tables ........................................................................................................... vii Index
    [Show full text]
  • The Cradle of Dari”: the Question of ”Origins” in Modern Literary Historiography in Afghanistan Wali Ahmadi
    ”The Cradle of Dari”: The Question of ”Origins” in Modern Literary Historiography in Afghanistan Wali Ahmadi To cite this version: Wali Ahmadi. ”The Cradle of Dari”: The Question of ”Origins” in Modern Literary Historiography in Afghanistan. Slovo, Presses de l’INALCO, 2020. hal-02485189 HAL Id: hal-02485189 https://hal.archives-ouvertes.fr/hal-02485189 Submitted on 24 Feb 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. “The Cradle of Dari”: The Question of “Origins” in Modern Literary Historiography in Afghanistan Wa l i Ahmadi University of California, Berkeley “In our time literary history has increasingly fallen into disrepute, and not at all without reason,” writes Hans Robert Jauss in his celebrated essay Literary History as a Challenge to Literary Theory. Its greatest achievements all belong to the nineteenth century. To write the history of a national literature counted [. .] as the crowning life’s work of the philologist. The patriarchs of the discipline saw their highest goal therein, to represent in the history of literary works (Dichtwerke) the idea of national individuality on its way to itself. This high point is already a distant memory.
    [Show full text]
  • AMHARIC for More Information BOSTON UNIVERSITY
    AMHARIC For more information BOSTON UNIVERSITY http://deseta.net/?attachment_id=70 About Ethiopia & Prof. Fallou Ngom Amharic Director, African Language Program Ethiopia is one of the oldest locations of [email protected] human existence. Scientists consider it the 617-353-3673 region from which Homo sapiens first set out for the Middle East and points beyond. Ethiopia traces its roots to the Aksumite http://www.bu.edu/africa/alp/ Empire circa 300BC-800AD. It has been a monarchy for most of its history. Alongside Rome, Persia, China, and India, the Kingdom of Aksum was one of the great world powers of the 3rd century and the one of the first major empires in the world to officially adopt Christianity as a state religion in the 4th century. Amharic is a Semitic language spoken in Ethiopia, Eritrea, and Djibouti. It is the second-most spoken Semitic language in the world after Arabic and is the language of some 2.7 million emigrants, including people in Europe, the US, and Canada. Amharic has a growing body of literature in many genres: novels, poetry, government proclamations and records, educational African Studies Center books, religious material, proverb collections, dictionaries, technical manuals, 232 Bay State Road and books about medical topics. Boston, MA 02215 www.bu.edu/africa Did you know? Jamaica has an Amharic connection! Roots of the word “Rastafari” actually come from Amharic, and many Rastafarians learn Amharic because they consider it a sacred language. After Ethiopian emperor Haile Selassie visited the island of Jamaica in 1966, Jamaicans Photo clipped from Fun with Phonics on Ethiopia TV organized study circles in Amharic—a parallel of sorts to the contemporary movement for MU 340 - Musical Cultures of the World civil rights in the United States.
    [Show full text]
  • Arab Cultural Awareness: 58 Factsheets
    TRADOC DCSINT HANDBOOK NO. 2 ARAB CULTURAL AWARENESS: 58 FACTSHEETS OFFICE OF THE DEPUTY CHIEF OF STAFF FOR INTELLIGENCE US ARMY TRAINING AND DOCTRINE COMMAND FT. LEAVENWORTH, KANSAS JANUARY 2006 PURPOSE This handbook is designed to specifically provide the trainer a ‘hip pocket training’ resource. It is intended for informal squad or small group instruction. The goal is to provide soldiers with a basic overview of Arab culture. It must be emphasized that there is no “one” Arab culture or society. The Arab world is full of rich and diverse communities, groups and cultures. Differences exist not only among countries, but within countries as well. Caveat: It is impossible to talk about groups of people without generalizing. It then follows that it is hard to talk about the culture of a group without generalizing. This handbook attempts to be as accurate and specific as possible, but inevitably contains such generalizations. Treat these generalizations with caution and wariness. They do provide insight into a culture, but the accuracy and usefulness will depend on the context and specific circumstances. Comments or Suggestions: Please forward all comments, suggestions or questions to: ADCINT-Threats, 700 Scott Ave, Ft. Leavenworth, KS 66027 or email [email protected] or phone 913.684.7920/DSN 552-7920. ii WHERE IS THE ARAB WORLD? • The Arab world stretches from Morocco across Northern Africa to the Persian Gulf. The Arab world is more or less equal to the area known as the Middle East and North Africa (MENA). Although this excludes Somalia, Djibouti, and the Comoros Islands which are part of the Arab world.
    [Show full text]
  • The Arabic Language: a Latin of Modernity? Tomasz Kamusella University of St Andrews
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by St Andrews Research Repository Journal of Nationalism, Memory & Language Politics Volume 11 Issue 2 DOI 10.1515/jnmlp-2017-0006 The Arabic Language: A Latin of Modernity? Tomasz Kamusella University of St Andrews Abstract Standard Arabic is directly derived from the language of the Quran. The Ara- bic language of the holy book of Islam is seen as the prescriptive benchmark of correctness for the use and standardization of Arabic. As such, this standard language is removed from the vernaculars over a millennium years, which Arabic-speakers employ nowadays in everyday life. Furthermore, standard Arabic is used for written purposes but very rarely spoken, which implies that there are no native speakers of this language. As a result, no speech com- munity of standard Arabic exists. Depending on the region or state, Arabs (understood here as Arabic speakers) belong to over 20 different vernacular speech communities centered around Arabic dialects. This feature is unique among the so-called “large languages” of the modern world. However, from a historical perspective, it can be likened to the functioning of Latin as the sole (written) language in Western Europe until the Reformation and in Central Europe until the mid-19th century. After the seventh to ninth century, there was no Latin-speaking community, while in day-to-day life, people who em- ployed Latin for written use spoke vernaculars. Afterward these vernaculars replaced Latin in written use also, so that now each recognized European lan- guage corresponds to a speech community.
    [Show full text]