What Is Modern Standard Arabic Nlp? Definition & Tools 4.2
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Automatic Identification of Arabic Language Varieties and Dialects in Social Media
Automatic Identification of Arabic Language Varieties and Dialects in Social Media Fatiha Sadat Farnazeh Kazemi Atefeh Farzindar University of Quebec in NLP Technologies Inc. NLP Technologies Inc. Montreal, 201 President Ken- 52 Le Royer Street W., 52 Le Royer Street W., nedy, Montreal, QC, Canada Montreal, QC, Canada Montreal, QC, Canada [email protected] [email protected] [email protected] Abstract Modern Standard Arabic (MSA) is the formal language in most Arabic countries. Arabic Dia- lects (AD) or daily language differs from MSA especially in social media communication. However, most Arabic social media texts have mixed forms and many variations especially be- tween MSA and AD. This paper aims to bridge the gap between MSA and AD by providing a framework for AD classification using probabilistic models across social media datasets. We present a set of experiments using the character n-gram Markov language model and Naive Bayes classifiers with detailed examination of what models perform best under different condi- tions in social media context. Experimental results show that Naive Bayes classifier based on character bi-gram model can identify the 18 different Arabic dialects with a considerable over- all accuracy of 98%. 1 Introduction Arabic is a morphologically rich and complex language, which presents significant challenges for nat- ural language processing and its applications. It is the official language in 22 countries spoken by more than 350 million people around the world1. Moreover, the Arabic language exists in a state of diglossia where the standard form of the language, Modern Standard Arabic (MSA) and the regional dialects (AD) live side-by-side and are closely related (Elfardy and Diab, 2013). -
Alif and Hamza Alif) Is One of the Simplest Letters of the Alphabet
’alif and hamza alif) is one of the simplest letters of the alphabet. Its isolated form is simply a vertical’) ﺍ stroke, written from top to bottom. In its final position it is written as the same vertical stroke, but joined at the base to the preceding letter. Because of this connecting line – and this is very important – it is written from bottom to top instead of top to bottom. Practise these to get the feel of the direction of the stroke. The letter 'alif is one of a number of non-connecting letters. This means that it is never connected to the letter that comes after it. Non-connecting letters therefore have no initial or medial forms. They can appear in only two ways: isolated or final, meaning connected to the preceding letter. Reminder about pronunciation The letter 'alif represents the long vowel aa. Usually this vowel sounds like a lengthened version of the a in pat. In some positions, however (we will explain this later), it sounds more like the a in father. One of the most important functions of 'alif is not as an independent sound but as the You can look back at what we said about .(ﺀ) carrier, or a ‘bearer’, of another letter: hamza hamza. Later we will discuss hamza in more detail. Here we will go through one of the most common uses of hamza: its combination with 'alif at the beginning or a word. One of the rules of the Arabic language is that no word can begin with a vowel. Many Arabic words may sound to the beginner as though they start with a vowel, but in fact they begin with a glottal stop: that little catch in the voice that is represented by hamza. -
CYP2D6 and CYP2C19 Genotyping in Psychiatry
CYP2D6 and CYP2C19 genotyping in psychiatry Citation for published version (APA): Koopmans, A. B. (2021). CYP2D6 and CYP2C19 genotyping in psychiatry: bridging the gap between practice and lab. ProefschriftMaken. https://doi.org/10.26481/dis.20210512ak Document status and date: Published: 01/01/2021 DOI: 10.26481/dis.20210512ak Document Version: Publisher's PDF, also known as Version of record Please check the document version of this publication: • A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal. -
DSU Music Newsletter
Delaware State University Music Department Spring 2018 Music Department Schedule Tues., Mar. 20, 11am: Music Performance Seminar (Theater) Volume 2: Issue 1 Fall 2018 Tues., Mar. 27, 11am: Music Performance Seminar (Theater) Concert choir to perform with Philadelphia orchestra Friday, April 6, 7:00 PM: Junior Recital; Devin Davis, Tenor, Anyre’ Frazier, Alto, On March 28, 29, and 30 of 2019, Tommia Proctor, Soprano (Dover Presbyterian Church) the Delaware State University Concert Choir under the direction of Saturday, April 7, 5:00 PM: Senior Capstone Recital; William Wicks, Tenor (Dover Presbyterian Church) Dr. Lloyd Mallory, Jr. will once again be joining the Philadelphia Sunday, April 8, 4:00 PM: Orchestra. The choir will be Senior Capstone Recital; Michele Justice, Soprano (Dover Presbyterian Church) performing the world premiere of Healing Tones, by the Orchestra’s Tuesday, April 10, 11:00 AM: Percussion Studio Performance Seminar (EH Theater) composer-in-residence Hannibal Lokumbe. In November of 2015 the Sunday, April 15, 4:00 PM: Delaware State University Choir Senior Capstone Recital; Marquita Richardson, Soprano (Dover Presbyterian Church) joined the Philadelphia Orchestra to Tuesday, April 17, 11:00 AM: DSU – A place where dreams begin perform the world premiere of Guest Speaker, Dr. Adrian Barnes, Rowan University (Music Hannibal’s One Land, One River, One Education/Bands) (EH 138) People. About the performance, the Friday, April 20, 12:30 PM: Philadelphia Inquirer said “The massed voices of the Delaware State University Choir, the Lincoln Honors Day, Honors Recital (EH Theater) University Concert Choir, and Morgan State University Choir sang with spirit, accuracy and, near- Friday, April 20, 7:00 PM: More inside! Pg. -
The Origins, Evolution and Decline of the Khojki Script
The origins, evolution and decline of the Khojki script Juan Bruce The origins, evolution and decline of the Khojki script Juan Bruce Dissertation submitted in partial fulfilment of the requirements for the Master of Arts in Typeface Design, University of Reading, 2015. 5 Abstract The Khojki script is an Indian script whose origins are in Sindh (now southern Pakistan), a region that has witnessed the conflict between Islam and Hinduism for more than 1,200 years. After the gradual occupation of the region by Muslims from the 8th century onwards, the region underwent significant cultural changes. This dissertation reviews the history of the script and the different uses that it took on among the Khoja people since Muslim missionaries began their activities in Sindh communities in the 14th century. It questions the origins of the Khojas and exposes the impact that their transition from a Hindu merchant caste to a broader Muslim community had on the development of the script. During this process of transformation, a rich and complex creed, known as Satpanth, resulted from the blend of these cultures. The study also considers the roots of the Khojki writing system, especially the modernization that the script went through in order to suit more sophisticated means of expression. As a result, through recording the religious Satpanth literature, Khojki evolved and left behind its mercantile features, insufficient for this purpose. Through comparative analysis of printed Khojki texts, this dissertation examines the use of the script in Bombay at the beginning of the 20th century in the shape of Khoja Ismaili literature. -
Arabic Language and Linguistics Certificate Revised: 05/2018
Arabic Language and Linguistics Certificate www.Linguistics.Pitt.edu Revised: 05/2018 Overview This certificate offers undergraduates a way to become highly proficient in the Arabic language and linguistic structure and to develop an understanding of important issues in Arabic life and culture. The certificate will conveniently accompany any undergraduate major in the Dietrich School and in other schools at the University of Pittsburgh. The program draws on the academic strengths and resources of the Department of Linguistics and on faculty from the School of Education. Enrollment in this program is limited to 20 students per academic year. Therefore, students must complete an evaluation process and be accepted into the program before declaring this certificate. For information, please contact the Amani Attia, the Arabic Coordinator, at [email protected]. Requirements Category 3: Culture of the Arabic-Speaking This certificate requires completion of 22-23 credits, World detailed as follows. Credit requirements do not include Choose one of the following courses. prerequisite courses. ARABIC 1615 Arabic Life and Thought ARABIC 1635 Introduction to Modern Arabic Literature Prerequisite courses Choose one dialect pair. Category 4: Electives ARABIC 0101 MSA Egyptian 1 Choose one of the following courses. Dialect courses ARABIC 0102 MSA Egyptian 2 must follow previous coursework. ARABIC 0105 MSA Egyptian 5 ARABIC 0121 MSA Levantine 1 ARABIC 0106 MSA Egyptian 6 ARABIC 0122 MSA Levantine 2 ARABIC 0125 MSA Levantine 5 ARABIC 0126 MSA Levantine 6 Category 1: Language instruction ARABIC 0211 Iraqi Arabic 1 Chose the same dialect pair as the prerequisite ARABIC 1615 Arabic Life and Thought language courses. -
Kiraz 2019 a Functional Approach to Garshunography
Intellectual History of the Islamicate World 7 (2019) 264–277 brill.com/ihiw A Functional Approach to Garshunography A Case Study of Syro-X and X-Syriac Writing Systems George A. Kiraz Institute for Advanced Study, Princeton and Beth Mardutho: The Syriac Institute, Piscataway [email protected] Abstract It is argued here that functionalism lies at the heart of garshunographic writing systems (where one language is written in a script that is sociolinguistically associated with another language). Giving historical accounts of such systems that began as early as the eighth century, it will be demonstrated that garshunographic systems grew organ- ically because of necessity and that they offered a certain degree of simplicity rather than complexity.While the paper discusses mostly Syriac-based systems, its arguments can probably be expanded to other garshunographic systems. Keywords Garshuni – garshunography – allography – writing systems It has long been suggested that cultural identity may have been the cause for the emergence of Garshuni systems. (In the strictest sense of the term, ‘Garshuni’ refers to Arabic texts written in the Syriac script but the term’s semantics were drastically extended to other systems, sometimes ones that have little to do with Syriac—for which see below.) This paper argues for an alterna- tive origin, one that is rooted in functional theory. At its most fundamental level, Garshuni—as a system—is nothing but a tool and as such it ought to be understood with respect to the function it performs. To achieve this, one must take into consideration the social contexts—plural, as there are many—under which each Garshuni system appeared. -
ARAB - Arabic (ARAB) 1
ARAB - Arabic (ARAB) 1 ARAB 301 Reading and Composition ARAB - ARABIC (ARAB) Credits 3. 3 Lecture Hours. Advanced Arabic grammar and readings of average difficulty and of ARAB 101 Beginning Arabic I different genres, including literary and journalistic texts and other Credits 4. 4 Lecture Hours. culturally-enriched materials in order to develop awareness of cultural (ARAB 1411) Beginning Arabic I. Introduction to Modern Standard Arabic products, perspectives, and practices found in the Arab world. in its written and spoken forms; emphasis on conversation, rudimentary Prerequisites: ARAB 202 or ARAB 204, or equivalent; junior or senior vocabulary, simple grammar, and reading. classification or approval of instructor. ARAB 102 Beginning Arabic II ARAB 302 Reading and Composition II Credits 4. 4 Lecture Hours. Credits 3. 3 Lecture Hours. (ARAB 1412) Beginning Arabic II. Introduction of more complex Readings of average difficulty and of different genres, including grammatical constructions; vocabulary building; emphasis on putting literary and journalistic texts and other culturally-enriched materials; acquired vocabulary and grammar to conversational use. development of writing skills with emphasis on grammatical Prerequisite: ARAB 101 or equivalent. constructions; expansion of vocabulary and oral expression. ARAB 104 Intensive Beginning Arabic Prerequisites: ARAB 301; junior or senior classification or approval of Credits 8. 8 Lecture Hours. instructor. Accelerated elementary language study, with oral, listening, reading and ARAB 321 Business Arabic writing practice. Equivalent to ARAB 101 and ARAB 102. Credits 3. 3 Lecture Hours. ARAB 201 Intermediate Arabic I Business and financial terminologies useful in the Arab World; cultural Credits 3. 3 Lecture Hours. etiquette for effective communication in Arabic business settings; (ARAB 2311) Intermediate Arabic I. -
Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only. -
Saudi Dialects: Are They Endangered?
Academic Research Publishing Group English Literature and Language Review ISSN(e): 2412-1703, ISSN(p): 2413-8827 Vol. 2, No. 12, pp: 131-141, 2016 URL: http://arpgweb.com/?ic=journal&journal=9&info=aims Saudi Dialects: Are They Endangered? Salih Alzahrani Taif University, Saudi Arabia Abstract: Krauss, among others, claims that languages will face death in the coming centuries (Krauss, 1992). Austin (2010a) lists 7,000 languages as existing and spoken in the world today. Krauss estimates that this figure could come down to 600. That is, most the world's languages are endangered. Therefore, an endangered language is a language that loses her speakers within a few generations. According to Dorian (1981), there is what is called ―tip‖ in language endangerment. He argues that a language's decline can start slowly but suddenly goes through a rapid decline towards the extinction. Thus, languages must be protected at much earlier stage. Arabic dialects such as Zahrani Spoken Arabic (ZSA), and Faifi Spoken Arabic (henceforth, FSA), which are spoken in the southern region of Saudi Arabia, have not been studied, yet. Few people speak these dialects, among many other dialects in the same region. However, the problem is that most these dialects' native speakers are moving to other regions in Saudi Arabia where they use other different dialects. Therefore, are these dialects endangered? What other factors may cause its endangerment? Have they been documented before? What shall we do? This paper discusses three main different points regarding this issue: language and endangerment, languages documentation and description and Arabic language and its family, giving a brief history of Saudi dialects comparing their situation with the whole existing dialects. -
A Note on the Genitive Particle Ħaqq in Yemeni Arabic Free Genitives Mohammed Ali Qarabesh, University of Albayda Mohammed Q
A note on ħaqq in Yemeni Arabic … Qarabesh & Shormani A Note on the Genitive particle ħaqq in Yemeni Arabic Free Genitives Mohammed Ali Qarabesh, University of Albayda Mohammed Q. Shormani, University of Ibb الملخص: تتىاوه هذي اىىرقح ميمح "حق" فٍ اىيهجح اىُمىُح ورتثتها اىىحىَح فٍ تزمُة إضافح اىمينُح اىتحيُيُح، وتقذً ىها تحيُو وحىٌ وصفٍ، حُث َفتزض اىثاحثان أن هىاك وىػُه مه هذي اىنيمح فٍ اىيهجح اىُمىُح: ا( تيل اىتٍ ﻻ تظهز ػيُها ػﻻماخ اىتطاتق، مثو "اىسُاراخ حق ػيٍ"، حُث وزي أن ميمح "اىسُاراخ" ىها اىسماخ )جمغ، مؤوث، غائة( وىنه ميمح "حق" ﻻ تتطاتق مؼها فٍ أٌ مه هذي اىصفاخ، و ب( تيل اىتٍ تظهز ػيُها ػﻻماخ اىتطاتق مثو "اىسُاراخ حقاخ ػيٍ" حُث تتطاتق اىنيمتان "اىسُاراخ" و"حقاخ" فٍ مو اىسماخ. وؼَزض اىثاحثان أن اىىىع اﻷوه َ ستخذً فٍ مىاطق مثو صىؼاء، ػذن، إب... اىخ، واىثاوٍ فٍ شثىج وحضزمىخ ... اىخ. وَخيص اىثاحثان إىً أن هىاك دىُو ػميٍ ىُس فقظ ػيً وجىد اىىحى اىنيٍ فٍ "اىمينح اىيغىَح" تو أَضا ػيً "تَ ْى َس َطح" هذا اىىحى، ىُس فقظ تُه اىيغاخ تو وتُه ىهجاخ اىيغح اىىاحذج. الكلمات المفتاحية: اىيغاخ اىسامُح، اىؼزتُح اىُمىُح، اىؼثزَح، اى م ْينُح، "حق" Abstract This paper provides a descriptive syntactic analysis of ħaqq in Yemeni Arabic (YA). ħaqq is a Semitic Free Genitive (FG) particle, much like the English of. A FG minimally consists of a head N, genitive particle and genitive DP complement. It (in a FG) expresses or conveys the meaning of possessiveness, something like of in English. There are two types of ħaqq in Yemeni Arabic: one not exhibiting agreement with the head N, and another exhibiting it. -
Arabic Sociolinguistics: Topics in Diglossia, Gender, Identity, And
Arabic Sociolinguistics Arabic Sociolinguistics Reem Bassiouney Edinburgh University Press © Reem Bassiouney, 2009 Edinburgh University Press Ltd 22 George Square, Edinburgh Typeset in ll/13pt Ehrhardt by Servis Filmsetting Ltd, Stockport, Cheshire, and printed and bound in Great Britain by CPI Antony Rowe, Chippenham and East bourne A CIP record for this book is available from the British Library ISBN 978 0 7486 2373 0 (hardback) ISBN 978 0 7486 2374 7 (paperback) The right ofReem Bassiouney to be identified as author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988. Contents Acknowledgements viii List of charts, maps and tables x List of abbreviations xii Conventions used in this book xiv Introduction 1 1. Diglossia and dialect groups in the Arab world 9 1.1 Diglossia 10 1.1.1 Anoverviewofthestudyofdiglossia 10 1.1.2 Theories that explain diglossia in terms oflevels 14 1.1.3 The idea ofEducated Spoken Arabic 16 1.2 Dialects/varieties in the Arab world 18 1.2. 1 The concept ofprestige as different from that ofstandard 18 1.2.2 Groups ofdialects in the Arab world 19 1.3 Conclusion 26 2. Code-switching 28 2.1 Introduction 29 2.2 Problem of terminology: code-switching and code-mixing 30 2.3 Code-switching and diglossia 31 2.4 The study of constraints on code-switching in relation to the Arab world 31 2.4. 1 Structural constraints on classic code-switching 31 2.4.2 Structural constraints on diglossic switching 42 2.5 Motivations for code-switching 59 2.