A Phoneme Clustering Algorithm Based on the Obligatory Contour Principle

Total Page:16

File Type:pdf, Size:1020Kb

A Phoneme Clustering Algorithm Based on the Obligatory Contour Principle A phoneme clustering algorithm based on the obligatory contour principle Mans Hulden Department of Linguistics University of Colorado [email protected] Abstract cases such as haplology (avoidance of adjacent identical syllables) also fall in this general cate- This paper explores a divisive hierarchi- gory of avoiding repetition along some dimension. cal clustering algorithm based on the well- The general phenomenon itself is supported by known Obligatory Contour Principle in robust, although inconsistent, evidence across a phonology. The purpose is twofold: to see number of languages. An early example is the if such an algorithm could be used for un- observation of Spitta-Bey(1880), 2 that the Ara- supervised classification of phonemes or bic language tends to favor combination of con- graphemes in corpora, and to investigate sonant segments (phonemes) in morphemes that whether this purported universal constraint have different places of articulation; this was also really holds for several classes of phono- later pointed out by Greenberg(1950) and those logical distinctive features. The algorithm Semitic root outliers that deviate from this pat- achieves very high accuracies in an unsu- tern were analyzed in depth in Frajzyngier(1979). pervised setting of inferring a consonant- In Proto-Indo-European (PIE) roots, which are vowel distinction, and also has a strong mostly structured CVC, stop-V-stop combinations tendency to detect coronal phonemes in an have been found to be statistically underrepre- unsupervised fashion. Remaining classes, sented (Iverson and Salmons, 1992). That is, PIE however, do not correspond as neatly seems to obey a cross-linguistic constraint that dis- to phonological distinctive feature splits. favors two similar consonants in a root. Another While the results offer only mixed support specific example comes from Japanese, where the for a universal Obligatory Contour Princi- phenomenon called Lyman’s law—which effec- ple, the algorithm can be very useful for tively says that a morpheme may consist of max- many NLP tasks due to the high accuracy imally one voiced obstruent—can also be inter- in revealing consonant/vowel/coronal dis- preted as avoidance (Itoˆ and Mester, 1986). tinctions. In light of such evidence, proposals have been 1 Introduction1 put forth to define the concept of phoneme by distributional properties alone as opposed to the It has long been noted in phonology that there prevalent distinctive feature systems which are seems to be a universal cross-linguistic tendency largely based on articulatory features (Fischer- to avoid redundancy or repetition of similar speech Jørgensen, 1952). Elsewhere, after finding a sta- features within a word or morpheme, especially if tistical tendency to avoid similar place of articula- the phonemes are adjacent to one another. Many tion in word-initial and word-medial consonants, different names are given to variants of this gen- Pozdniakov and Segerer(2007) offer the argument eral phenomenon in the linguistic literature: “iden- tity avoidance” (Yip, 1998), “similar place avoid- 2Nun hat, wie schon langst¨ bemerkt ist, die arabische ance” (Pozdniakov and Segerer, 2007), “oblig- Sprache die Neigung, solche Buchstaben in einem Worte zu vereinigen, deren Organe weit von einander entfernt liegen, atory contour principle” (OCP) (Leben, 1973), wie Kehllaute und Dentale. Translation: Now, the Arabic and “dissimilation” (Hempl, 1893). Some special language, as has long been noted, has the tendency to com- bine such letters in a word where the place of articulation is 1All code data sets used are available at https:// distant, such as gutturals and dentals (Spitta-Bey, 1880, p. github.com/cvocp/cvocp 15). 290 Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pages 290–300, Vancouver, Canada, August 3 - August 4, 2017. c 2017 Association for Computational Linguistics that this phenomenon of “Similar Place Avoid- son, 1983; Sukhotin, 1962). The algorithm is also ance” is a statistical universal. more robust than earlier algorithms that perform This phenomenon is often filed under the consonant-vowel separation and works with less generic heading “obligatory contour principle” data, something that is also briefly evaluated. (Leben, 1973; McCarthy, 1986; Yip, 1988; Odden, This paper is structured as follows: an overview 1988; Meyers, 1997; Pierrehumbert, 1993; Rose, of previous work is given in section2, mostly 2000; Frisch, 2004). Originally, the OCP was ap- related to the simpler task of grouping conso- plied as a theoretical constraint only to tone lan- nants and vowels without labeled data, rather than guages, with the argument that adjacent identical identifying distinctive features. Following that, tones in underlying forms were rare, and this re- the general algorithm is developed in section3, flected an obligatory contour principle. The usage after which the experiments on both phonemic has since spread, and is assumed to account for and graphemic representations in section4 are re- segmental features other than tone. ported. Four experiments are evaluated. The first uses phonemic data from 9 languages for clus- It is unclear why the phenomenon is so tering and evaluates clustering along distinctive widespread and why it manifests itself in the di- feature lines. The second is a graphemic exper- verse ways it does. Accounts range from informa- iment that uses a data set of Bible translations tion compression to a diachronically visible hyper- in 503 languages where the task is to distinguish correction by listeners who misperceive the signal the vowels from the consonants; here, results are and make the assumption that repetition is unlikely compared to Kim and Snyder(2013) on the same (Ohala, 1981). data set. That data is slightly noisy, motivating This paper explores the simplest incarnation of the third experiment, which is also graphemic and the idea of similarity avoidance; namely, that two evaluates consonant-vowel distinctions on vetted adjacent segments are preferably different in some word lists from data taken from the ACL SIG- way and that this difference reveals itself glob- MORPHON shared task on morphological reinflec- ally. That is, it is not assumed that the con- tion (Cotterell et al., 2016). The ability of a tier- straint is absolute; rather, an algorithm is devel- based variant of the algorithm to separate coro- oped that induces grouping of unknown phoneme nals from non-coronals is evaluated in a fourth ex- symbols so as to maximize potential alternation periment where Universal Dependencies corpora of clusters in a sequence of symbols, i.e. a cor- (Nivre et al., 2017) are used. pus. If the OCP holds for phonological or phonetic The main results are presented in section5. features—primarily places of articulation—such a Given the high accuracy of the algorithm in C/V clustering algorithm could group phonemes along distinction with very little data and its consequent the lines of distinctive features. While, as we potential applicability to decipherment tasks, a shall see, the observations do not support the pres- small practical example application is evaluated ence of a strong universal OCP effect, the top-level which analyzes a fragment of text, a manuscript clusters discovered by the algorithm correspond of only 54 characters. nearly 100% to the distinction of consonants and vowels—or syllabic and non-syllabic elements if 2 Related Work expressed in terms of features. Furthermore, a tier- based variant of the algorithm additionally groups The statistical experiments of Andrey consonants somewhat reliably into coronal/non- Markov (1913) on Alexander Pushkin’s poem coronal places of articulation, and also often dis- Eugene Onegin constitute what is probably one of tinguishes front vowels from back vowels. This the earliest discoveries of the fact that significant is true even if the algorithm is run on alphabetic latent structure can be found by examining representations. An evaluation of the ability to immediate co-occurrence of graphemes in text. detect C/V distinction against a data set of 503 Examining a 20,000-letter sample of the poem, Bible translations (Kim and Snyder, 2013) is in- Markov found a strong statistical bias that favored cluded, improving upon earlier work that attempts alternation of consonants and vowels. A number to distinguish between consonants and vowels in of computational approaches have since been an unsupervised fashion (Kim and Snyder, 2013; investigated that attempt to reveal phonological Goldsmith and Xanthos, 2009; Moler and Morri- structure in corpora. Often, orthography is used 291 as a proxy for phonology since textual data this alternation, one can assume that there is a nat- is easier to come by. A spectral method was ural grouping of all segments into two initial sets, introduced by Moler and Morrison(1983) with called 0 and 1, in such a way that the total number the explicit purpose of distinguishing consonants of 0-1 or 1-0 alternations between adjacent seg- from vowels by a dimensionality reduction on a ments in a corpus is maximized. For example, segment co-occurrence matrix through singular consider a corpus of a single string abc. This can value decomposition (SVD). An almost iden- be split into two nonempty subsets in six different tical SVD-based approach was later applied to ways: 0 = ab and 1 = c ; 0 = a and 1 = bc ; { } { } { } { } phonological data by Goldsmith and Xanthos 0 = ac and 1 = b , and their symmetric variants { } { } (2009). Hidden Markov Models coupled with which are produced by swapping 0 and 1. Out of the EM algorithm have also been used to learn these, the best assignment is 0 = ac and 1 = b , { } { } consonant-vowel distinctions (Knight et al., since if reflects an alternation of sets where abc 7→ 2006) as well as other latent structure, such as 010. The ‘score’ of this assignment is based on the vowel harmony (Goldsmith and Xanthos, 2009). number of adjacent alternations, in this case 2 (01 Kim and Snyder(2013) use Bayesian inference and 10).
Recommended publications
  • DRAFT Arabtex a System for Typesetting Arabic User Manual Version 4.00
    DRAFT ArabTEX a System for Typesetting Arabic User Manual Version 4.00 12 Klaus Lagally May 25, 1999 1Report Nr. 1998/09, Universit¨at Stuttgart, Fakult¨at Informatik, Breitwiesenstraße 20–22, 70565 Stuttgart, Germany 2This Report supersedes Reports Nr. 1992/06 and 1993/11 Overview ArabTEX is a package extending the capabilities of TEX/LATEX to generate the Perso-Arabic writing from an ASCII transliteration for texts in several languages using the Arabic script. It consists of a TEX macro package and an Arabic font in several sizes, presently only available in the Naskhi style. ArabTEX will run with Plain TEXandalsowithLATEX2e. It is compatible with Babel, CJK, the EDMAC package, and PicTEX (with some restrictions); other additions to TEX have not been tried. ArabTEX is primarily intended for generating the Arabic writing, but the stan- dard scientific transliteration can also be easily produced. For languages other than Arabic that are customarily written in extensions of the Perso-Arabic script some limited support is available. ArabTEX defines its own input notation which is both machine, and human, readable, and suited for electronic transmission and E-Mail communication. However, texts in many of the Arabic standard encodings can also be processed. Starting with Version 3.02, ArabTEX also provides support for fully vowelized Hebrew, both in its private ASCII input notation and in several other popular encodings. ArabTEX is copyrighted, but free use for scientific, experimental and other strictly private, noncommercial purposes is granted. Offprints of scientific publi- cations using ArabTEX are welcome. Using ArabTEX otherwise requires a license agreement. There is no warranty of any kind, either expressed or implied.
    [Show full text]
  • “Muda” System in Facilitating Writing Arabic Romanization in Word Processor; Standardization of Transliteration and Transcription
    European Journal of Molecular & Clinical Medicine ISSN 2515-8260 Volume 8, Issue 2, 2021 “MUDA” SYSTEM IN FACILITATING WRITING ARABIC ROMANIZATION IN WORD PROCESSOR; STANDARDIZATION OF TRANSLITERATION AND TRANSCRIPTION Nurulasyikin “MUDA”, PhD1, Hazmi Dahlan2 1Pusat Penataran Ilmu dan Bahasa, Universiti Malaysia Sabah, Kampus Antarabangsa Labuan 2Fakulti Kewangan Antarabangsa Labuan, Universiti Malaysia Sabah, Kampus Antarabangsa Labuan Abstract This study aims to propose a new system in Arabic romanization to simplify the typing process by using the QWERTY keyboard. The proposed system will be named as “MUDA” as the researcher’s surname and the name itself is nearly similar to the word “MUDA”H in Malay which mean easy or simple that represent the concept of the system. This is to facilitate the needs of the writers, and to decrease the gap of ambiguity for the systems created previously with the one that can increase the product of academic writing when dealing with Arabic romanization. The data of this study is analysed by finding the similarity within the previous transliteration systems and the phonetic system in English. Both systems then will be analysed by comparing them with the characters on the typing keyboard in terms of similarity. This study found that there are 23 keys will be used in 3 ways as follow; (i) single character typing excluding shift; (ii) shift + character (T,S,D,Z,”); and (iii) combining character (th,sh,gh,kh,dz). This explains that no more complicated system of typing process will occur. Hence the speed of completing task in typing Arabic romanization will become faster. Key Words: Arabic Romanization, typing keyboard, word processing Introduction Arabic has its own character as what Japanese, Russian, Tamil and others do.
    [Show full text]
  • Contents Origins Transliteration
    Ayin , ע Ayin (also ayn or ain; transliterated ⟨ʿ⟩) is the sixteenth letter of the Semitic abjads, including Phoenician ʿayin , Hebrew ʿayin ← Samekh Ayin Pe → [where it is sixteenth in abjadi order only).[1) ع Aramaic ʿē , Syriac ʿē , and Arabic ʿayn Phoenician Hebrew Aramaic Syriac Arabic The letter represents or is used to represent a voiced pharyngeal fricative (/ʕ/) or a similarly articulated consonant. In some Semitic ع ע languages and dialects, the phonetic value of the letter has changed, or the phoneme has been lost altogether (thus, in Modern Hebrew it is reduced to a glottal stop or is omitted entirely). Phonemic ʕ The Phoenician letter is the origin of the Greek, Latin and Cyrillic letterO . representation Position in 16 alphabet Contents Numerical 70 value Origins (no numeric value in Transliteration Maltese) Unicode Alphabetic derivatives of the Arabic ʿayn Pronunciation Phoenician Hebrew Ayin Greek Latin Cyrillic Phonetic representation Ο O О Significance Character encodings References External links Origins The letter name is derived from Proto-Semitic *ʿayn- "eye", and the Phoenician letter had the shape of a circle or oval, clearly representing an eye, perhaps ultimately (via Proto-Sinaitic) derived from the ır͗ hieroglyph (Gardiner D4).[2] The Phoenician letter gave rise to theGreek Ο, Latin O, and Cyrillic О, all representing vowels. The sound represented by ayin is common to much of theAfroasiatic language family, such as in the Egyptian language, the Cushitic languages and the Semitic languages. Transliteration Depending on typography, this could look similar .( ﻋَ َﺮب In Semitic philology, there is a long-standing tradition of rendering Semitic ayin with Greek rough breathing the mark ̔〉 (e.g.
    [Show full text]
  • Arabic in Romanization
    Transliteration of Arabic 1/6 ARABIC Arabic script* DIN 31635 ISO 233 ISO/R 233 UN ALA-LC EI 1982(1.0) 1984(2.0) 1961(3.0) 1972(4.0) 1997(5.0) 1960(6.0) iso ini med !n Consonants! " 01 # $% &% ! " — (3.1)(3.2) — (4.1) — — 02 ' ( ) , * ! " #, $ (2.1) —, ’ (3.3) %, — (4.2) —, ’ (5.1) " 03 + , - . b b b b b b 04 / 0 1 2 t t t t t t 05 3 4 5 6 & & & th th th 06 7 8 9 : ' ' ' j j dj 07 ; < = > ( ( ( ) ( ( 08 ? @ * + + kh kh kh 09 A B d d d d d d 10 C D , , , dh dh dh 11 E F r r r r r r 12 G H I J z z z z z z 13 K L M N s s s s s s 14 O P Q R - - - sh sh sh 15 S T U V . / . 16 W X Y Z 0 0 0 d 1 0 0 17 [ \ ] ^ 2 2 2 3 2 2 18 _ ` a b 4 4 4 z 1 4 4 19 c d e f 5 5 5 6 6 5 20 g h i j 7 7 8 gh gh gh 21 k l m n f f f f f f 22 o p q r q q q q q 9 23 s t u v k k k k k k 24 w x y z l l l l l l 25 { | } ~ m m m m m m 26 • € • n n n n n n 27 h h h h h h 28 … " h, t (1.1) : ;, <(3.4) h, t (4.3) h, t (5.2) a, at (6.1) 29 w w w w w w 30 y y y y y y 31 ! = — y y ! • 32 s! l! la" l! l! l! l! 33 # al- (1.2) "#al (2.2) al- (3.5) al- (4.4) al- (5.3) al-, %l- (6.2) Thomas T.
    [Show full text]
  • Languages of New York State Is Designed As a Resource for All Education Professionals, but with Particular Consideration to Those Who Work with Bilingual1 Students
    TTHE LLANGUAGES OF NNEW YYORK SSTATE:: A CUNY-NYSIEB GUIDE FOR EDUCATORS LUISANGELYN MOLINA, GRADE 9 ALEXANDER FFUNK This guide was developed by CUNY-NYSIEB, a collaborative project of the Research Institute for the Study of Language in Urban Society (RISLUS) and the Ph.D. Program in Urban Education at the Graduate Center, The City University of New York, and funded by the New York State Education Department. The guide was written under the direction of CUNY-NYSIEB's Project Director, Nelson Flores, and the Principal Investigators of the project: Ricardo Otheguy, Ofelia García and Kate Menken. For more information about CUNY-NYSIEB, visit www.cuny-nysieb.org. Published in 2012 by CUNY-NYSIEB, The Graduate Center, The City University of New York, 365 Fifth Avenue, NY, NY 10016. [email protected]. ABOUT THE AUTHOR Alexander Funk has a Bachelor of Arts in music and English from Yale University, and is a doctoral student in linguistics at the CUNY Graduate Center, where his theoretical research focuses on the semantics and syntax of a phenomenon known as ‘non-intersective modification.’ He has taught for several years in the Department of English at Hunter College and the Department of Linguistics and Communications Disorders at Queens College, and has served on the research staff for the Long-Term English Language Learner Project headed by Kate Menken, as well as on the development team for CUNY’s nascent Institute for Language Education in Transcultural Context. Prior to his graduate studies, Mr. Funk worked for nearly a decade in education: as an ESL instructor and teacher trainer in New York City, and as a gym, math and English teacher in Barcelona.
    [Show full text]
  • Romanization of Arabic 1 Romanization of Arabic
    Romanization of Arabic 1 Romanization of Arabic Arabic alphabet ﺍ ﺏ ﺕ ﺙ ﺝ ﺡ ﺥ ﺩ ﺫ ﺭ ﺯ ﺱ ﺵ ﺹ ﺽ ﻁ ﻅ ﻉ ﻍ ﻑ ﻕ ﻙ ﻝ ﻡ ﻥ ﻩ ﻭ ﻱ • History • Transliteration • Diacritics (ء) Hamza • • Numerals • Numeration Different approaches and methods for the romanization of Arabic exist. They vary in the way that they address the inherent problems of rendering written and spoken Arabic in the Latin script. Examples of such problems are the symbols for Arabic phonemes that do not exist in English or other European languages; the means of representing the Arabic definite article, which is always spelled the same way in written Arabic but has numerous pronunciations in the spoken language depending on context; and the representation of short vowels (usually i u or e o, accounting for variations such as Muslim / Moslem or Mohammed / Muhammad / Mohamed ). Method Romanization is often termed "transliteration", but this is not technically correct. Transliteration is the direct representation of foreign letters using Latin symbols, while most systems for romanizing Arabic are actually transcription systems, which represent the sound of the language. As an example, the above rendering is a transcription, indicating the pronunciation; an ﺍﻟﻌﺮﺑﻴﺔ ﺍﻟﺤﺮﻭﻑ ﻣﻨﺎﻇﺮﺓ :munāẓarat al-ḥurūf al-ʻarabīyah of the Arabic example transliteration would be mnaẓrḧ alḥrwf alʻrbyḧ. Romanization standards and systems This list is sorted chronologically. Bold face indicates column headlines as they appear in the table below. • IPA: International Phonetic Alphabet (1886) • Deutsche Morgenländische Gesellschaft (1936): Adopted by the International Convention of Orientalist Scholars in Rome. It is the basis for the very influential Hans Wehr dictionary (ISBN 0-87950-003-4).
    [Show full text]
  • Transliteration of Arabic 1/6 ARABIC Arabic Script*
    Transliteration of Arabic 1/6 ARABIC Arabic script* DIN 31635 ISO 233 ISO/R 233 UN ALA-LC EI 1982(1.0) 1984(2.0) 1961(3.0) 1972(4.0) 1997(5.0) 1960(6.0) iso ini med !n Consonants! " 01 # $% &% ! " — (3.1)(3.2) — (4.1) — — 02 ' ( ) , * ! " #, $ (2.1) —, ’ (3.3) %, — (4.2) —, ’ (5.1) " 03 + , - . b b b b b b 04 / 0 1 2 t t t t t t 05 3 4 5 6 & & & th th th 06 7 8 9 : ' ' ' j j dj 07 ; < = > ( ( ( ) ( ( 08 ? @ * + + kh kh kh 09 A B d d d d d d 10 C D , , , dh dh dh 11 E F r r r r r r 12 G H I J z z z z z z 13 K L M N s s s s s s 14 O P Q R - - - sh sh sh 15 S T U V . / . 16 W X Y Z 0 0 0 d 1 0 0 17 [ \ ] ^ 2 2 2 3 2 2 18 _ ` a b 4 4 4 z 1 4 4 19 c d e f 5 5 5 6 6 5 20 g h i j 7 7 8 gh gh gh 21 k l m n f f f f f f 22 o p q r q q q q q 9 23 s t u v k k k k k k 24 w x y z l l l l l l 25 { | } ~ m m m m m m 26 • € • n n n n n n 27 h h h h h h 28 … " h, t (1.1) : ;, <(3.4) h, t (4.3) h, t (5.2) a, at (6.1) 29 w w w w w w 30 y y y y y y 31 ! = — y y ! • 32 s! l! la" l! l! l! l! 33 # al- (1.2) "#al (2.2) al- (3.5) al- (4.4) al- (5.3) al-, %l- (6.2) Thomas T.
    [Show full text]
  • Arabic Alphabet 1 Arabic Alphabet
    Arabic alphabet 1 Arabic alphabet Arabic abjad Type Abjad Languages Arabic Time period 400 to the present Parent systems Proto-Sinaitic • Phoenician • Aramaic • Syriac • Nabataean • Arabic abjad Child systems N'Ko alphabet ISO 15924 Arab, 160 Direction Right-to-left Unicode alias Arabic Unicode range [1] U+0600 to U+06FF [2] U+0750 to U+077F [3] U+08A0 to U+08FF [4] U+FB50 to U+FDFF [5] U+FE70 to U+FEFF [6] U+1EE00 to U+1EEFF the Arabic alphabet of the Arabic script ﻍ ﻉ ﻅ ﻁ ﺽ ﺹ ﺵ ﺱ ﺯ ﺭ ﺫ ﺩ ﺥ ﺡ ﺝ ﺙ ﺕ ﺏ ﺍ ﻱ ﻭ ﻩ ﻥ ﻡ ﻝ ﻙ ﻕ ﻑ • history • diacritics • hamza • numerals • numeration abjadiyyah ‘arabiyyah) or Arabic abjad is the Arabic script as it is’ ﺃَﺑْﺠَﺪِﻳَّﺔ ﻋَﺮَﺑِﻴَّﺔ :The Arabic alphabet (Arabic codified for writing the Arabic language. It is written from right to left, in a cursive style, and includes 28 letters. Because letters usually[7] stand for consonants, it is classified as an abjad. Arabic alphabet 2 Consonants The basic Arabic alphabet contains 28 letters. Adaptations of the Arabic script for other languages added and removed some letters, such as Persian, Ottoman, Sindhi, Urdu, Malay, Pashto, and Arabi Malayalam have additional letters, shown below. There are no distinct upper and lower case letter forms. Many letters look similar but are distinguished from one another by dots (’i‘jām) above or below their central part, called rasm. These dots are an integral part of a letter, since they distinguish between letters that represent different sounds.
    [Show full text]
  • Transliteration of Arabic 1/6 ARABIC Arabic Script*
    Transliteration of Arabic 1/6 ARABIC Arabic script* DIN 31635 ISO 233 ISO/R 233 UN ALA-LC EI 1982(1.0) 1984(2.0) 1961(3.0) 1972(4.0) 1997(5.0) 1960(6.0) iso ini med fin Consonants — — ((4.1 — (3.1)(3.2) — ʾ ā ﺎ ا 01 02 (2.1) (3.3) (4.2) (5.1) ʾ ’ — —, ʼ, ’ ˌ —, ˈ, ʾ ◌ٕ ,◌ٔ ء 03 b b b b b b ﺐ ﺒ ﺑ ب 04 t t t t t t ﺖ ﺘ ﺗ ت 05 th th th ṯ ṯ ṯ ﺚ ﺜ ﺛ ث 06 dj j j ǧ ǧ ǧ ﺞ ﺠ ﺟ ج 07 ḥ ḥ ḩ ḥ ḥ ḥ ﺢ ﺤ ﺣ ح 08 kh kh kh ẖ ẖ ḫ ﺦ ﺨ ﺧ خ 09 d d d d d d ﺪ د 10 dh dh dh ḏ ḏ ḏ ﺬ ذ 11 r r r r r r ﺮ ر 12 z z z z z z ﺰ ز 13 s s s s s s ﺲ ﺴ ﺳ س 14 sh sh sh š š š ﺶ ﺸ ﺷ ش 15 ṣ ṣ ş ṣ ṣ ṣ ﺺ ﺼ ﺻ ص 16 ḍ ̧ ḍ d ḍ ḍ ḍ ﺾ ﻀ ﺿ ض 17 ṭ ṭ ţ ṭ ṭ ṭ ﻂ ﻄ ﻃ ط 18 ẓ ̧ ẓ z ẓ ẓ ẓ ﻆ ﻈ ﻇ ظ 19 ʿ ʻ ʻ ʿ ʿ ʿ ﻊ ﻌ ﻋ ع 20 gh gh gh ḡ ġ ġ ﻎ ﻐ ﻏ غ 21 f f f f f f ﻒ ﻔ ﻓ ف 22 ḳ q q q q q ﻖ ﻘ ﻗ ق 23 k k k k k k ﻚ ﻜ ﻛ ك 24 l l l l l l ﻞ ﻠ ﻟ ل 25 m m m m m m ﻢ ﻤ ﻣ م 26 n n n n n n ﻦ ﻨ ﻧ ن 27 h h h h h h ﻪ ﻬ ﻫ ه 28 (1.1) (3.4) (4.3) (5.2) (6.1) at a, t h, t h, ᵗ ʰ, ẗ t h, ﺔ ة 29 w w w w w w ﻮ و 30 y y y y y y ﻲ ﻴ ﻳ ي 31 ā y y — ỳ ā ﻰ ى 32 lā lā lā lā laʾ lā ﻼ ﻻ (1.2) (2.2) (3.5) (4.4) (5.3) (6.2) -ʼl al-, ʾˈal al- al- al- al- ال 33 Thomas T.
    [Show full text]
  • MELA Notes 75–76 (Fall 2002–Spring 2003)
    A Guide to Arabic, Persian, Turkish, and Urdu Manuscript Libraries in India Omar Khalidi Aga Khan Program for Islamic Architecture Massachusetts Institute of Technology Introduction hen scholars of Islamic studies think of manuscripts in Arabic W and related languages, they almost invariably turn to the great library holdings in the Middle East and Europe, forgetting that there are huge collections elsewhere, for example in India. It is estimated that in 2003, India possesses nearly one hundred thousand manuscripts in Arabic script spread over a number of libraries in various parts of the country. This number is in addition to what may be available in undocumented private collections. The Indian collections are renowned for the importance of many individual items, from some of the finest calligraphic and illustrated manuscripts of the Qur-an to autograph and other high-quality copies of major legal, literary, scientific, and historial works. Manuscripts produced in India but taken away ille- gally to Europe is another category altogether. Should various cultural properties of Indian Islamic origin found in foreign countries ever be returned to their place of origin, pre¨eminenceof many European mu- seums and libraries would be diminished. For instance, most of the Arabic and Persian, and Urdu collections in the British Library are of Indian origin.1 Similar is the case with Persian manuscripts in France’s Biblioth`equenationale.2 Poet philosopher Allama Iqbal lamented the theft of Indian books in European libraries in a memorable couplet: 1 Ursula Sims-Williams, “The Arabic and Persian Collections in the India Office Library,” pp. 47–52, in Collections in British Libraries on Middle Eastern and Islamic Studies, Durham, U.K., 1981.
    [Show full text]
  • First Name Initial Last Name
    David R. Lawson. An Evaluation of Arabic Transliteration Methods. A Master’s Paper for the M.S. in L.S degree. April, 2008. 55 pages. Advisor: Ronald Bergquist The American Library Association and the Library of Congress currently use a cooperatively developed Arabic transliteration system that is not ASCII-compatible and that incorporates the use of diacritical marks native to neither Arabic nor English. This study seeks to investigate whether the adoption of an alternate Arabic transliteration system by ALA and LC can increase both user access to the materials as well as the ability of librarians to correctly catalog them. The various systems are evaluated based upon phonetic and spelling accuracy, as well as usability, the adherence to not using non- native diacritics, and their compatibility with ASCII standards. A parallel with the issues in Korean transliteration is made in order to show how another language written in a non- Roman script approached the issues. After the analysis, a recommendation is made and avenues for further study are explored. Headings: Arabic language/Transliteration Arabic literature/Cataloging Cataloging Cataloging/Transliteration/Use Studies Transliteration AN EVALUATION OF ARABIC TRANSLITERATION METHODS by David R. Lawson A Master’s paper submitted to the faculty of the School of Information and Library Science of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Master of Science in Library Science. Chapel Hill, North Carolina March 2008 Approved
    [Show full text]
  • Italian and Arabic
    Wolfgang Schweickard Italian and Arabic 1 Introduction 5 Editorial philology: performance 2 Arabic in Southern Italy and limits 3 Arabic elements in historical 6 Lexicological and lexicographical sources research 3.1 Pilgrimages 7 Conclusions 3.2 Trade, politics, diplomacy 8 Bibliography 3.3 Scientific texts 8.1 Abbreviations 3.4 Translations of religious texts 8.2 Primary sources 3.5 Individual voyagers 8.3 Studies, corpora and dictionaries 4 Special features of the Arabic- 9 Index verborum Italian language contact 9.1 Arabic 4.1 Arabic as the mediating language 9.2 Catalan 4.2 Borrowing routes of the Arabic 9.3 Greek elements 9.4 Italian 4.3 Number and status of the 9.5 (Medieval) Latin borrowings 9.6 Persian 4.4 Language skills 9.7 Portuguese 4.5 Modalities of adaptation 9.8 Sicilian 4.6 Excursus: Italian elements in 9.9 Spanish Arabic 9.10 Turkish Abstract: This article focuses on Italian and Arabic language contact in the Medi- terranean until early modern times. Particular emphasis will be placed on lexical exchange with Italian as the recipient language. The most important contact zone between Arabic and Italian was southern Italy. Numerous Arabic elements also appear in texts and documents of pilgrims, merchants and diplomats who traveled to Arabia as well as in translations from Arabic. Special features of those contacts are dealt with in separate chapters: Arabic as the intermediate language for bor- rowings with a different remote etymology (Greek, Persian), the various channels of transmission of genuine Arabic elements, the number and status of the borrow- ings, the degree of familiarity of the travelers with Arabic, basic patterns of formal adaptation and corrupt spellings, and finally, in a brief excursus, Italian elements in Arabic.
    [Show full text]