Arabic Language Resources and Tools

Total Page:16

File Type:pdf, Size:1020Kb

Arabic Language Resources and Tools Lexicon-Driven Approach to the Recognition of Arabic Named Entities Jack Halpern (春遍雀來) The CJK Dictionary Institute (日中韓辭典研究所) 34-14, 2-chome, Tohoku, Niiza-shi, Saitama 352-0001, Japan [email protected] Abstract Various factors contribute to the difficulties in processing Arabic personal names and named entities, posing special challenges to developers of NLP applications in the areas of named entity recognition (NER), machine translation (MT), morphological analysis (MA) and information retrieval (IR). These include the complexity of the Arabic orthography, the high level of orthographical and morphological ambiguity, the multitude of highly irregular romanization systems, and the vast number of romanized variants that are difficult to detect and disambiguate. This paper focuses on the orthographic variation of Arabic personal names, with special emphasis on the ambiguity resulting from transcribing names into the Roman script (romanization). It describes the techniques used to compile the Database of Arabic Names (DAN), a large-scale lexical resource containing millions of Arabic names and their variants in both romanized and fully vocalized Arabic, and argues that linguistic knowledge in the form of a rule-driven lexicon can enhance the accuracy of statistical methods to achieve high accuracy in the recognition of Arabic names. Moohammad, Moohamad, Mohammad, Mohamad, 1 Concepts and Definitions etc. Much confusion surrounds the terms transliteration and transcription, with the former often misleadingly used in Transliteration, enclosed in back slashes, is a the sense of the latter even in academic papers representation of the script of a source language by using (AbdulJaleel and Larkey, 2003). To discuss these the characters of another script. Ideally, it unambiguously concepts in an unambiguous manner it is necessary to represents the graphemes, rather than the phonemes, of understand these and related terms correctly. the source language. For example, is transliterated as \mHmd\, in which each Arabic letter is unambiguously Romanization is the representation of a language written represented by one Roman letter, enabling round-trip in a non-Roman script using the Roman alphabet. This conversion. A transliteration system widely used in NLP includes both transliteration and various kinds of applications is the excellent Buckwalter system (Beesley, transcription. There are various official systems for 2003). Many academic papers misleadingly use the term romanizing Arabic. transliteration when what they actually mean is transcription (Halpern, 2007). Transcription is a representation of the source script of a language in the target script in a manner that reflects the To summarize, the name is transliterated as pronunciation of the original, often ignoring graphemic \mHm~d\, phonetically transcribed as [muħɛmm̈ ɛd̈ ], correspondence. This includes the following phonemically transcribed as /muHammad/, and romanized subcategories: in various popular transcriptions such as Mohammed, Muhammad and Mohamad, among many others. 1. A phonetic transcription, enclosed in square brackets, represents the actual speech sounds, including In this paper, the adjectives Arab and Arabic, especially in allophones. The best known systems are IPA and the context of Arab name and Arabic name, are not SAMPA. For example, is transcribed as interchangeable. Arab refers to names of Arabs (as [muħɛmm̈ ɛd̈ ]. opposed to names of non-Arabs) regardless of whether 2. A phonemic transcription, enclosed in slashes, they are written in the Arabic or the Roman script, represents the phonemes of the source language whereas Arabic is more general and refers to Arab names (ignoring allophones), ideally on a one-to-one basis. written in the Roman script or any name (Arab and non- For example, is transcribed as /muHammad/, in Arab) written in the Arabic script. which a represents the phoneme /a/, rather than the phone [ɛ].̈ Some well known systems include ICS 2 Why the Arabic script is ambiguous (Intelligence Community Standard) and ALC-LA The Arabic script is highly ambiguous. A distinguishing (American Library Association-Library of Congress feature of written Arabic is that words are represented as a romanization standard) (Library and Archives string of consonants with little or no indication of vowels, Canada, 2006). referred to as unvocalized Arabic. For example, the string 3. A popular transcription, indicated by italics, is a can theoretically represent 40 consonant-vowel conventionalized orthography that roughly represents permutations, such as mawa, mawwa, mawi, mawwi, etc. pronunciation. For example, is transcribed in Even fully vocalized Arabic is not phonemic because of over 100 ways, such as Mohammed, Mohamed, the many one-to-many grapheme-to-phoneme ambiguities (Halpern, 2007 and 2008). The principal factors 193 contributing to orthographical ambiguity are summarized and is romanized in some unusual ways as shown in the below. table below. 1. The omission of short vowels: e.g., the unvocalized Roman Example Frequency can represent the following seven wordforms: q Qaddafi 00077900 ك ات ب g Gaddafi 00219000 كَاتِبٌ ,/kaatibin/ كَاتِبٍ ,/kaataba/ كَاتَبَ ,/kaatib/ كَاتِب gh Ghaddafi 00013100 كَاتِبُ kaatibi/ and/ كَاتِبِ ,/kaatiba/ كَاتِبَ ,/kaatibun/ /kaatibu/. k Kaddafi 00068300 kh Khaddafi 00007380 ,آ ,ى ,ا ,.Long /aa/ can be represented various ways, i.e .2 c Caddafi 00000034 ,ال و سطي and آ س يا , سوري ا or zero, as illustrated by or even j Jaddafi 00000031 ,ة ,ه ,while short /a/ can be represented by zero especially in loanwords), as illustrated in the) ا ,اﻹ س ك ندري و :various Arabic variants for Alexandria romanizations ق Table 1: Examples of .اﻹ س ك ندري ا and ,اﻹ س ك ندري ت 3. Consonant gemination is ambiguous since shadda is The official romanization systems, which are used rather محمد normally omitted, e.g., the unvocalized infrequently, are just a fraction of the plethora of .(مُحَمَد muHammad/ has no shadda (vocalized as/ 4. The omission of nunation diacritics for case endings, romanizations which for lack of a better name we shall ."the fatHatayn lump together under the label "popular transcriptions ,(شُكْرًا shukran/ (vocalized/ ش كرا e.g., in is not written. These transcriptions are often inconsistent, irregular and alif maqSuura), unpredictable, sometimes leading to hundreds or more') ى Alternation between ً (yaa') and .5 especially in Egypt, causes confusion. For example, than a thousand variants for a single name. For example, -the names /muHammad/ and /al .اب و ظ بي Abu Dhabi' is often written as' ابو ظبي taa' marbuuTa), qadhaafii/ are touted as having more than 100 romanized) ة haa') and) ه Alternation between .6 uuda/ being also spelt as variants. As impressive as this number may seem, it pales`/ عودة resulting in names like /in comparison to names like /`abdurrahiim .عوده 7. The rules for determining the hamza seat are of and /`abdurrazzaaq/, which have over 1100 notorious complexity. variants. Those cases are extreme, but names with several 8. Phonological alternation processes such as dozen variants are very common. For example, the assimilation modify the phonetic realization, e.g. popular name /maHmuud/ has 69 variants, the is realized as /'arrajulu-TTawiilu/. most frequent of which are shown below along with their الرجل الطويل 9. Compound names are often written either solid or frequency of occurrence on the web. ع بدال رح يم open, e.g., /`abd-urrahiim/ is written either This is true of various other common Variant Frequency .عبد الرحيم or bin/, Mahmoud 0020400000/ ب ن abuu/ and'/ اب و name elements like 10. Long vowels are often neutralized, a fact which is Mahmud 0005770000 mostly ignored in the literature. Even such common Mahmood 0004050000 haadha/ 'this' are often Mahmut 0003780000/ هٰذَا ana/ 'I' and'/ أَنَا words as incorrectly transcribed as 'anaa and haadhaa. Mehmood 0000685000 Mahmod 0000138000 uuda/ Machmut 0000006640`/ عودة These ambiguities result in names such as and variation due to common Machmud 0000108000 ,عوده also being spelt as abuu/ being Mehmud 0000082400'/ اب و abd/ and`/ ع بد name elements such as detached from or attached to the rest of the name. A Mahmoed 0000052100 Mechmod 0000048000 عبد العزيز combination of these factors lead to a name like /`abd-'al-`aziiz/ having eight variants in Arabic, including Makhmud 0000042700 As a result, it is .ع بدال عزي ز and ,ع بدإل عزي ز ,عبدلعزيز difficult to identify such strings as variants of the same Table 2: Common romanized variants of underlying name, leading to lower recall in named entity extraction and recognition. The various ambiguity factors described above result in an 3 Why romanized Arabic is ambiguous immense number of romanized variants. Arabic also has a high level of romanization ambiguity. 4 Lexicon-Driven Approach That is, an Arabic name can be romanized in a multitude of ways. One reason for this is that many official and An important motivation for using statistical methods in unofficial romanization systems are used to transcribe NLP applications has been the poor availability and the Arabic sounds in a bewildering number of ways. For high cost of constructing large-scale lexical databases, but shuuluukh/ is transcribed in several statistical methods by themselves are inadequate for/ شول وخ ,example official systems as follows: shwlwkh in the ALA-LC dealing with the ambiguities of the Arabic script, system, Shulukh in ICS (Intelligence Community especially because of the difficulties in algorithmically Standard), šūlūḫ in DIN, Shūlūkh
Recommended publications
  • The Late Sheikh Abdullah Azzam's Books
    Combating Terrorism Center Guest Commentary The Late Sheikh Abdullah Azzam’s Books Part III: Radical Theories on Defending Muslim Land through Jihad LCDR Youssef Aboul‐Enein, MSC, USN The Combating Terrorism Center United States Military Academy West Point, NY http://www.ctc.usma.edu Please direct all inquiries to Brian Fishman [email protected] 845.938.2801 Introduction Sheikh Abdullah Azzam is a name that only gets attention among true students of Islamist militancy, yet he has had a tremendous impact on Usama Bin Laden and left him with the tools needed to establish a global jihadist network. Azzam was born in Jenin, Palestine in 1941, and was evicted from his hometown of Jenin in the 1967 Six‐Day War. He spent years pursuing his studies in Islamic jurisprudence attending university in Syria and graduating with a doctorate in Islamic studies from the prestigious Al‐Azhar University in Cairo, Egypt. He was nicknamed the fighting cleric for his obsession with jihadist ideology and the militant works of ibn Taymiyyah (1258 AD). Azzam believed the only way to reclaim his lost homeland was through violent jihad which later became his bsession. On or about 1980, Azzam realized that the Arab jihadists fighting the Soviets in Afghanistan required organization, safe house, and structure. He established Maktab al‐Khidmat lil Mujahideen (The Services Offices for Arab Jihadists) which attracted Usama Bin Laden, then graduating from King Abdul‐ Aziz University to join his new venture. Azzam convinced Bin Laden that his financial connections, business experience, and dedication would be of great use to his new organization in Pakistan.
    [Show full text]
  • Al-Qaida Sanctions List Last Updated on 12 December 2014
    Al-Qaida Sanctions List Last updated on 12 December 2014 The List established and maintained by the Al-Qaida Sanctions Committee with respect to individuals, groups, undertakings and other entities associated with Al-Qaida Last updated on: 12 December 2014 Composition of the List The list consists of the two sections specified below: A. Individuals associated with Al Qaida B. Entities and other groups and undertakings associated with Al Qaida Information about de-listing may be found on the Committee's website at: http://www.un.org/sc/committees/1267/delisting.shtml . A. Individuals associated with Al-Qaida QI.A.12.01. Name: 1: NASHWAN 2: ABD AL-RAZZAQ 3: ABD AL-BAQI 4: na وان د ارزاق د ا :(Name (original script Title: na Designation: na DOB: 1961 POB: Mosul, Iraq Good quality a.k.a.: a) Abdal Al-Hadi Al-Iraqi b) Abd Al-Hadi Al-Iraqi Low quality a.k.a.: Abu Abdallah Nationality: Iraqi Passport no.: na National identification no.: na Address: na Listed on: 6 Oct. 2001 (amended on 14 May 2007, 27 Jul. 2007) Other information: Al-Qaida senior official. In custody of the United States of America, as of July 2007. Review pursuant to Security Council resolution 1822 (2008) was concluded on 15 Jun. 2010. QI.A.157.04. Name: 1: ABD AL WAHAB 2: ABD AL HAFIZ 3: na 4: na د اوھب د اظ :(Name (original script Title: na Designation: na DOB: 7 Sep. 1967 POB: Algiers, Algeria Good quality a.k.a.: a) Ferdjani Mouloud b) Rabah Di Roma c) Abdel Wahab Abdelhafid, born 30 Oct.
    [Show full text]
  • The Ḥaram Al-Sharīf No. 302 (MSR XXII 2019)
    An Arabic Marriage Contract and Subsequent Divorce from Mamluk Jerusalem: The Ḥaram al-Sharīf No. 302 iruhammaMuhamMuhaMBruham A Marriage Contract and Divorce Muhammad N. Abdul-Rahman King Faisal University An Arabic Marriage Contract and Subsequent Divorce from Mamluk Jerusalem: The Ḥaram al-Sharīf No. 302 Introduction Documents from the Ḥaram al-Sharīf 1 are considered to be among the most im- portant historical sources from Mamluk Jerusalem because there are so few nar- 10.6082/pkrn-4q45 rative or documentary sources that chronicle the city in this period. In addition, URI they are the oldest extant documents concerning the affairs of the city’s Muslims in particular, and its Jews and Christians in general. Among the most important of the Ḥaram al-Sharīf documents are those related to the situation of dhimmīs in Jerusalem during the Mamluk era. 2 As is well known, the Jews and Christians shared their lives in Jerusalem with Muslims, and the mingling of customs and traditions was a defining characteristic of the society of medieval Jerusalem. This is a revised version of a paper given at Writing Semitic: Scripts, Documents, Languages in Historical Context: The Sixth International Society for Arabic Papyrology Conference, hosted by the Committee of Semitic Philology of the Bavarian Academy of Sciences and Humanities, 7–10 October 2014. I would like to thank Matt Malczycki and Lucian Reinfandt, as well as the anonymous reader, for their helpful remarks on my edition of the document and for their useful comments on this paper. I am also grateful to them for checking my English.
    [Show full text]
  • Knowledge Representation of the Quran Through Frame Semantics: a Corpus-Based Approach
    CORE Metadata, citation and similar papers at core.ac.uk Provided by White Rose Research Online This is a repository copy of Knowledge representation of the Quran through frame semantics: a corpus-based approach. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/81364/ Proceedings Paper: Abdul-Baquee, S and Atwell, ES (2009) Knowledge representation of the Quran through frame semantics: a corpus-based approach. In: Proceedings of the Fifth Corpus Linguistics Conference. The Fifth Corpus Linguistics Conference, 20-23 July 2009, University of Liverpool, UK. University of Liverpool . Reuse Unless indicated otherwise, fulltext items are protected by copyright with all rights reserved. The copyright exception in section 29 of the Copyright, Designs and Patents Act 1988 allows the making of a single copy solely for the purpose of non-commercial research or private study within the limits of fair dealing. The publisher or other rights-holder may allow further reproduction and re-use of this version - refer to the White Rose Research Online record for this item. Where records identify the publisher as the copyright holder, users can verify any specific terms of use on the publisher’s website. Takedown If you consider content in White Rose Research Online to be in breach of UK law, please notify us by emailing [email protected] including the URL of the record and the reason for the withdrawal request. [email protected] https://eprints.whiterose.ac.uk/ Knowledge representation of the Quran through frame semantics A corpus-based approach Abdul-Baquee Sharaf Eric Atwell School of Computing University of Leeds Leeds, LS2 9JT United Kingdom {scsams,eric}@comp.leeds.ac.uk Abstract In this paper, we present our in-progress research tasks for building lexical database of the verb valences in the Arabic Quran using FrameNet frames.
    [Show full text]
  • The Dynamics of Inclusion and Exclusion of Settler Colonialism: the Case of Palestine
    NEAR EAST UNIVERSITY GRADUATE SCHOOL OF SOCIAL SCIENCES INTERNATIONAL RELATIONS PROGRAM WHO IS ELIGIBLE TO EXIST? THE DYNAMICS OF INCLUSION AND EXCLUSION OF SETTLER COLONIALISM: THE CASE OF PALESTINE WALEED SALEM PhD THESIS NICOSIA 2019 WHO IS ELIGIBLE TO EXIST? THE DYNAMICS OF INCLUSION AND EXCLUSION OF SETTLER COLONIALISM: THE CASE OF PALESTINE WALEED SALEM NEAREAST UNIVERSITY GRADUATE SCHOOL OF SOCIAL SCIENCES INTERNATIONAL RELATIONS PROGRAM PhD THESIS THESIS SUPERVISOR ASSOC. PROF. DR. UMUT KOLDAŞ NICOSIA 2019 ACCEPTANCE/APPROVAL We as the jury members certify that the PhD Thesis ‘Who is Eligible to Exist? The Dynamics of Inclusion and Exclusion of Settler Colonialism: The Case of Palestine’prepared by PhD Student Waleed Hasan Salem, defended on 26/4/2019 4 has been found satisfactory for the award of degree of Phd JURY MEMBERS ......................................................... Assoc. Prof. Dr. Umut Koldaş (Supervisor) Near East University Faculty of Economics and Administrative Sciences, Department of International Relations ......................................................... Assoc. Prof.Dr.Sait Ak şit(Head of Jury) Near East University Faculty of Economics and Administrative Sciences, Department of International Relations ......................................................... Assoc. Prof.Dr.Nur Köprülü Near East University Faculty of Economics and Administrative Sciences, Department of Political Science ......................................................... Assoc. Prof.Dr.Ali Dayıoğlu European University of Lefke
    [Show full text]
  • Abdul Halim Hasan and His Contributions in Quranic Exegesis in the Malay World
    International Journal of Academic Research in Business and Social Sciences 2017, Vol. 7, No. 8 ISSN: 2222-6990 Abdul Halim Hasan and His Contributions in Quranic Exegesis in the Malay World Abdul Qadir Umar Usman Al-Hamidy1*, Muhammad Mustaqim Mohd Zarif1 1Faculty of Quran and Sunnah Studies, Islamic Science University of Malaysia, Bandar Baru Nilai, 71800, Nilai, Negeri Sembilan. Email: [email protected] *Corresponding Author DOI: 10.6007/IJARBSS/v7-i8/3253 URL: http://dx.doi.org/10.6007/IJARBSS/v7-i8/3253 ABSTRACT The main objectives of this paper are to expound the exegetical methodologies and contributions of Syeikh Abdul Halim Hasan (d. 1969) in enriching the corpus of Quranic exegetical works in the Malay World. In particular, it focuses on his two main exegetical works entitled Tafsir al-Quran al-Karim which he authored with the assistance of his pupils from 1937 until his demise in 1969, and a work on legal verses of the Quran entitled Tafsir al-Ahkam. Due to its nature, appropriate qualitative methods are applied in this study comprising mainly historical, library research and content analysis. The findings highlight some of the salient features of the exegetical methods of the author and its primal role in influencing the style of exegetical composition by succeeding authors. These books also reflect the concern of the local scholars on the needs of the Muslims for credible religious works and references in languages accessible to them, which have also had its impact on the development in the field of Quranic exegesis in the region. Keywords: Abdul Halim Hasan, Quran, Exegesis, Religious Discourse, Malay World INTRODUCTION The Quran, as the ultimate source of Islam, has been the subject of studies and analysis by scholars since its revelation to Prophet Muhammad some 1,400 years ago.
    [Show full text]
  • Afghanistan INDIVIDUALS
    CONSOLIDATED LIST OF FINANCIAL SANCTIONS TARGETS IN THE UK Last Updated:01/02/2021 Status: Asset Freeze Targets REGIME: Afghanistan INDIVIDUALS 1. Name 6: ABBASIN 1: ABDUL AZIZ 2: n/a 3: n/a 4: n/a 5: n/a. DOB: --/--/1969. POB: Sheykhan village, Pirkowti Area, Orgun District, Paktika Province, Afghanistan a.k.a: MAHSUD, Abdul Aziz Other Information: (UK Sanctions List Ref):AFG0121 (UN Ref): TAi.155 (Further Identifiying Information):Key commander in the Haqqani Network (TAe.012) under Sirajuddin Jallaloudine Haqqani (TAi.144). Taliban Shadow Governor for Orgun District, Paktika Province as of early 2010. Operated a training camp for non Afghan fighters in Paktika Province. Has been involved in the transport of weapons to Afghanistan. INTERPOL-UN Security Council Special Notice web link: https://www.interpol.int/en/How-we- work/Notices/View-UN-Notices-Individuals click here. Listed on: 21/10/2011 Last Updated: 01/02/2021 Group ID: 12156. 2. Name 6: ABDUL AHAD 1: AZIZIRAHMAN 2: n/a 3: n/a 4: n/a 5: n/a. Title: Mr DOB: --/--/1972. POB: Shega District, Kandahar Province, Afghanistan Nationality: Afghan National Identification no: 44323 (Afghan) (tazkira) Position: Third Secretary, Taliban Embassy, Abu Dhabi, United Arab Emirates Other Information: (UK Sanctions List Ref):AFG0094 (UN Ref): TAi.121 (Further Identifiying Information): Belongs to Hotak tribe. Review pursuant to Security Council resolution 1822 (2008) was concluded on 29 Jul. 2010. INTERPOL-UN Security Council Special Notice web link: https://www.interpol.int/en/How-we-work/ Notices/View-UN-Notices-Individuals click here. Listed on: 23/02/2001 Last Updated: 01/02/2021 Group ID: 7055.
    [Show full text]
  • The Evolution of the Arabic Language Through Online Writing: the Explosion of 2011
    - 1 - The evolution of the Arabic language through online writing: the explosion of 2011 Saussan Khalil Abstract The role of the internet in the popular protests of 2011 cannot be overestimated. Most importantly, the internet allowed online activists to escape censorship and communicate to thousands if not millions of people in real time. What is interesting about this form of communication is the language of choice particularly in Egypt – for centuries Classical (CA) or Modern Standard (MSA) Arabic have been the accepted forms of writing; however, the form of language being used online leans more towards colloquial Arabic, which has up until now only been accepted as a spoken form. The relationship between the written and spoken forms of Arabic in Egypt has been detailed by Haeri (2003), but the use of spoken Arabic in online writing is yet to be explored. This paper looks at the relationship between the form of the language used in online writing and the messages being conveyed. The suggestion is that away from the censorship of state media and the press, writers are free to use dialectal forms of the language for a freer, more direct approach to their readers, which has been more effective in communicating their message than the use of CA or MSA would have been. - 2 - Introduction Ferguson (1959) first described Arabic as a ‘diglossic’ language, meaning it has distinct written and spoken forms. This premise has been generally accepted with Classical Arabic (CA) and later Modern Standard Arabic (MSA) constituting the written form, and the numerous dialects of Arabic as its spoken forms.
    [Show full text]
  • A Guide to Names and Naming Practices
    March 2006 AA GGUUIIDDEE TTOO NN AAMMEESS AANNDD NNAAMMIINNGG PPRRAACCTTIICCEESS This guide has been produced by the United Kingdom to aid with difficulties that are commonly encountered with names from around the globe. Interpol believes that member countries may find this guide useful when dealing with names from unfamiliar countries or regions. Interpol is keen to provide feedback to the authors and at the same time develop this guidance further for Interpol member countries to work towards standardisation for translation, data transmission and data entry. The General Secretariat encourages all member countries to take advantage of this document and provide feedback and, if necessary, updates or corrections in order to have the most up to date and accurate document possible. A GUIDE TO NAMES AND NAMING PRACTICES 1. Names are a valuable source of information. They can indicate gender, marital status, birthplace, nationality, ethnicity, religion, and position within a family or even within a society. However, naming practices vary enormously across the globe. The aim of this guide is to identify the knowledge that can be gained from names about their holders and to help overcome difficulties that are commonly encountered with names of foreign origin. 2. The sections of the guide are governed by nationality and/or ethnicity, depending on the influencing factor upon the naming practice, such as religion, language or geography. Inevitably, this guide is not exhaustive and any feedback or suggestions for additional sections will be welcomed. How to use this guide 4. Each section offers structured guidance on the following: a. typical components of a name: e.g.
    [Show full text]
  • Sentiment Analysis As a Case Study
    Can Modern Standard Arabic Approaches be used for Arabic Dialects? Sentiment Analysis as a Case Study Kathrein Abu kwaik Stergios Chatzikyriakidis Simon Dobnik CLASP and FLOV, University of Gothenburg, Sweden fkathrein.abu.kwaik,stergios.chatzikyriakidis,[email protected] Abstract where MSA is the official language used for ed- ucation, news, politics, religion and, in general, in We present the Shami-Senti corpus, the first any type of formal setting, but dialects are used in Levantine corpus for Sentiment Analysis (SA), and investigate the usage of off-the-shelf mod- everyday communication, as well as in informal els that have been built for Modern Standard writing (Versteegh, 2014). Arabic (MSA) on this corpus of Dialectal Ara- In this paper, we examine whether it is possi- bic (DA). We apply the models on DA data, ble to adapt classification models that have been showing that their accuracy does not exceed trained and built on MSA data for DA from the 60%. We then proceed to build our own mod- Levantine region, or whether we should build and els involving different feature combinations train specific models for the individual dialects, and machine learning methods for both MSA and DA and achieve an accuracy of 83% and therefore considering them as stand-alone lan- 75% respectively. guages. To answer this question we use Sentiment Analysis as a case study. Our contributions are the 1 Introduction following: There is a growing need for text mining and an- • We systematically evaluate how well the ML alytical tools for Social Media data, for exam- models on MSA for SA perform on DA of ple Sentiment Analysis (SA) tools which aim to Levantine; distinguish people’s views into positive and neg- • We construct and present a new sentiment ative, objective and subjective responses, or even corpus of Levantine DA; into neutral opinions.
    [Show full text]
  • SRO 1288 Dated 22 December 2015
    EXTRAORDINARY PUBLISHED BY AUTHORITY ______________________________________________________________________________ ISLAMABAD, TUESDAY, December 29, 2015 ______________________________________________________________________________ Part II Statutory Notifications (S.R.O.) Government of Paksitan MINISTRY OF FOREIGN AFFAIRS ORDER Islamabad the 22 December 2015 S.R.O.1288 (I)/2015. – WHEREAS the United Nations Security Council vide its Resolutions Nos. 1267(1999), 1333 (2000), 1373 (2001), 1390 (2002), 1455 (2003), 1526 (2004), 1617 (2005), 1735 (2006), 1822 (2008), 1904 (2009), 1988 (2011), 1989 (2011), 2082 (2012), 2083 (2012), 2133 (2014), 2160 (2014), 2161 (2014) 2170(2014), 2178(2014), 2199 (2015) and 2253 (2015) has directed to apply travel restrictions, arms embargo and to freeze the funds and other financial resources of certain individuals and entities; 2. AND WHEREAS through paragraph 1 of United Nations Security Council resolution 2253(2015) adopted on 17 December 2015 under Chapter VII of the United Nations Charter, the United Nations Security Council has decided that, from the date of adoption of this resolution, the 1267/1989 Al-Qaida Sanctions Committee shall henceforth be known as the “1267/1989/2253 ISIL (Da’esh) and Al-Qaida Sanctions Committee” and the Al-Qaida Sanctions List shall henceforth be known as the ISIL (Da’esh) and Al-Qaida Sanctions List; 3. AND WHEREAS through paragraph 2 of United Nations Security Council resolution 2253 (2015) adopted under Chapter VII of the United Nations Charter, the United Nations Secuirty
    [Show full text]
  • The Effect of Culture and Religion on Enforcement of International Arbitration Awards in Iran" (2018)
    Golden Gate University School of Law GGU Law Digital Commons Theses and Dissertations Student Scholarship 4-2018 The ffecE t of Culture and Religion on Enforcement of International Arbitration Awards in Iran Atoosa Zeinali Golden Gate University School of Law, [email protected] Follow this and additional works at: https://digitalcommons.law.ggu.edu/theses Part of the Comparative and Foreign Law Commons, and the International Law Commons Recommended Citation Zeinali, Atoosa, "The Effect of Culture and Religion on Enforcement of International Arbitration Awards in Iran" (2018). Theses and Dissertations. 74. https://digitalcommons.law.ggu.edu/theses/74 This Dissertation is brought to you for free and open access by the Student Scholarship at GGU Law Digital Commons. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of GGU Law Digital Commons. For more information, please contact [email protected]. THE EFFECT OF CULTURE AND RELIGION ON ENFORCEMENT OF INTERNATIONAL ARBITRATION A WARDS IN IRAN A Dissertation Submitted To The Committee of International Legal Studies In Candidacy for the Degree Of Scientiae Juridicae Doctor Department of International Legal Studies GOLDEN GATE UNIVERSITY By Atoosa Zeinali San Francisco, California April, 2018 Copyright By Atoosa Zeinali 2018 Golden Gate University The Dissertation Committee for Atoosa Zeinali Certifies that this is the Approved Version ofthe Following Dissertation: The Effect of Culture and Religion on Enforcement of International Arbitration Awards in Iran APPROVED BY SUPERVISING COMMITTEE CHAIR: Professor Dr. Arthur Gemmell II THE EFFECT OF CULTURE AND RELIGION ON ENFORCEMENT OF INTERNATIONAL ARBITRATION AWARDS IN IRAN A Dissertation Submitted To The Committee of International Legal Studies In Candidacy for the Degree Of Scientiae Juridicae Doctor Department of International Legal Studies GOLDEN GATE UNIVERSITY By Atoosa Zeinali San Francisco, California April, 2018 Ill Dissertation Committee Professor Dr.
    [Show full text]