View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Archive Ouverte en Sciences de l'Information et de la Communication A Study of a Non-Resourced Language: The Case of one of the Algerian Dialects Karima Meftouh, Najette Bouchemal, Kamel Smaïli To cite this version: Karima Meftouh, Najette Bouchemal, Kamel Smaïli. A Study of a Non-Resourced Language: The Case of one of the Algerian Dialects. The third International Workshop on Spoken Languages Tech- nologies for Under-resourced Languages - SLTU’12, May 2012, Cape-town, South Africa. pp.1-7. hal-00727042 HAL Id: hal-00727042 https://hal.archives-ouvertes.fr/hal-00727042 Submitted on 14 Sep 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A STUDY OF A NON-RESOURCED LANGUAGE: THE CASE OF ONE OF THE ALGERIAN DIALECT K. Meftouh, N. Bouchemal K.Smaili UBMA LORIA Badji Mokhtar University Campus scientifique Informatic Department BP 139, 54500 Vandoeuvre Les` BP 12, 23000 Annaba, Algeria Nancy Cedex, France ABSTRACT Egypt, ... This paper presents a linguistic study of an algerian arabic di- In this paper, we will focus on algerian dialect. We have to alect, namely the dialect of Annaba (AD). It also presents the understand that the concept of dialect here is different from methodology applied in the construction of a parallel corpus what is admitted in west. In fact, people in their day life do MSA-AD. This work is done in a future goal of developing not use standard Arabic but dialect, which is in most cases a machine translation system of standard Arabic (MSA) to different from standard Arabic. Consequently, people who algerian arabic dialects. are not educated can not understand standard Arabic which is considered as a foreign language. Index Terms— Machine translation system, Standard Ara- This work is part of a project TORJMAN1 which is dedicated bic, Algerian arabic dialect, parallel corpus, dialect of Annaba, to translating standard Arabic to algerian arabic dialect. In- cosine similarity measure terest in such extremely complicated problem can be very surprising. In fact, it is difficult to understand this issue but 1. INTRODUCTION when we analyze the spoken language in different places in Algeria for instance, we can notice that almost nobody Arabic is a Semitic language, it is used by around 250 million speaks standard Arabic even if the official language of Al- people, but is understood by up to four times more among geria is standard Arabic. Furthermore, this spoken language Muslims around the world [1]. Arabic is a language divided is not written. The idea of this project is twofold, first un- into 3 separate groups: Classical written Arabic, written mod- derstand the function and the underlying structure of algerian ern standard Arabic and spoken Arabic. dialects and then provide the population and social-economic Classical written Arabic is principally defined as the Arabic actors, a tool enabling the average user to understand the used in the Qur’an and in the earliest literature from the ara- standard Arabic. We present in the following section (section bian peninsula, but also forms the core of much literature up 2) why should we be interested in arabic dialect. until our time. written modern standard Arabic (or MSA, also called Alfus’ha), is the variety of Arabic most widely used in print media, official documents, correspondence, education, 2. WHY ARE WE INTERESTED IN COLLOQUIAL and as a liturgical language. It is essentially a modern variant ARABIC? of classical Arabic. Standard Arabic is not acquired as a mother tongue, but rather it is learned as a second language We see at international conferences post September 11, 2001, at school and through exposure to formal broadcast programs a craze increasingly important for machine translation of stan- (such as the daily news), religious practice, and print media. dard Arabic to Indo-European languages. These studies are Spoken Arabic is often referred to as colloquial Arabic, di- important when it comes to translating official documents, alects, or vernaculars. It’s a mixed form, which has many however if you want to develop applications for the average variations, and often a dominating influence from local lan- citizen, it is necessary to take into account his mother tongue, guages (from before the introduction of arabic). Differ- it means his dialect. ences between the various variants of spoken Arabic can be The main dialectal division is between the Maghreb dialects large enough to make them incomprehensible to one another. and those of the middle east, followed by that between seden- Hence, regarding the large differences between such spoken tary dialects and bedouin ones. languages, we can consider them as disparate languages or Watson writes ”Dialects of Arabic form a roughly continuous more exactly as different dialects depending on the geograph- 1TORJMAN is a national research project which is totally financed by the ical place in which they are practiced : Morocco, Algeria, algerian research ministry spectrum of variation, with the dialects spoken in the eastern 3. ALGERIAN ARABIC and western extremes of the Arab-speaking world being mu- tually unintelligible” [2]. Effectively, while middle easterners In Algeria, as elsewhere, spoken Arabic differs from written can generally understand one another, they often have trouble Arabic; algerian Arabic has a vocabulary inspired from Ara- understanding Maghrebis2. Although the converse is not true, bic but the original words have been altered phonologically, due to the popularity of middle eastern, especially egyptian, with significant Berber substrates, and many new words and films and other media. In some cases people from these coun- loanwords borrowed from french, turkish and spanish. Like tries are unable to understand each other, at most few words all arabic dialects, algerian Arabic has dropped the case end- are unknown for them [3]. In other cases, people from one of ings of the written language. It is not used in schools, tele- the concerned country could find the grammatical structure vision or newspapers, which usually use standard Arabic or of the neighbor country bit understandable. Table1 provides French, but is more likely, heard in music if not just heard in a simple, yet interesting, example of how spoken varieties of algerian homes and on the street. Algerian Arabic is spoken Arabic differ in intelligibility. The English sentence I am go- daily by the vast majority of Algerians [5]. Algerian Arabic ing now is given in the syrian, egyptian, tunisian and algerian is part of the maghreb arabic dialect continuum, and fades dialects and in MSA with their respective transliteration. into moroccan Arabic and tunisian Arabic along the respec- tive borders. Algerian Arabic vocabulary is pretty much sim- ilar throughout Algeria, although the easterners sound closer to Tunisians while the westerners speak an Arabic closer to Table 1. Variants of arabic dialects expressing the English that of the Moroccans. sentence I am going now We focus, in this paper, on one of the easterners dialects of MSA à B@ Ië@XA K @ -ana¯ dahibun¯ al¯ -an¯ Algeria: Annaba’s dialect (AD). This choice is justified by . ¯ Egyptian úGZñËX l' @P AK @ -ana¯ rayih¯ . dilw-ty the fact that this dialect is the one we know best. We present in section 4 its peculiarities. Syrian Éë hðP h@P rah¯ . ruh¯ . halla Tunisian ba¯sˇ nimsyˇ tawa¯ øñK ú æÖß AK. 4. SPECIFICITIES OF ANNABA’S DIALECT Algerian ¼PX hðQK h@P rah¯ . nruh¯ . durk To develop any application based on a language, at least a Moroccan HX øXA«A K @ -ana¯ g˙ady¯ daba basic linguistic study is necessary even if we use a statistical . model. In this section, we present the main features of the dialect of Annaba in which we are concerned. These examples reflect clearly the distance between di- Annaba’s dialect is spoken in the city of Annaba located east alectal sentences expressing the same idea. If we consider of Algeria. It is spoken by more than one million people. Like only the word à B@ al¯ -an¯ (Now) in MSA, we remark that its for Maghreb arabic dialects, the most notable features of this equivalent in each of the considered dialects differs from that dialect, is the collapse of short vowels in some positions. The used in the others: dilw-ty in egyptian, halla in word H AJ» kitab¯ (book) in MSA correspond to H AJ» ktab¯ : ú GZñËX Éë . syrian, øñK tawa¯ in tunisian, ¼PX durk in algerian and HX the short vowel @ i kasra on the first consonant » k- in MSA daba in moroccan. is deleted in dialectal and replaced by the sukun¯ . Now let us consider maghreb spoken languages. There are In AD, the consonant q is generally pronounced v. For clearly two native languages in Morocco and Algeria, alge- ¬ qal val rian or moroccan Arabic and Berber3 (respectively 40 to 50% example ÈA¯ ¯ (to say) is pronounced ÈA¯ ¯ . For some of Berbers in Morocco, and 25 to 30% in Algeria). In Tunisia, words both alternatives exist like the word ©¢¯ qt.a, which there are only few Berbers (1 or 2%). In addition, the number can be also pronounced vt.a,. We give in Table 2 a list of of monolingual berbers in rural areas is not negligible.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages7 Page
-
File Size-