Pashto Spoken Digits Database for the Automatic Speech Recognition Research

Total Page:16

File Type:pdf, Size:1020Kb

Pashto Spoken Digits Database for the Automatic Speech Recognition Research Proceedings of the 18th International Conference on Automation & Computing, Loughborough University, Leicestershire, UK, 8 September 2012 Pashto Spoken Digits Database for the Automatic Speech Recognition Research 1Arbab Waseem Abbas, 1Nasir Ahmad, 2Hazrat Ali 1 Department of Computer Systems Engineering 2 Department of Electrical & Electronics Engineering University of Engineering and Technology Peshawar, Pakistan [email protected], [email protected] Abstract--This paper presents the development of a to recognize an isolated whole-word speech of the ten Pashto Spoken Digits database for the automatic speech digits, zero through nine in Arabic. The Hidden Markov recognition research. This is, to the best of author’s Model Toolkit (HTK) was used to implement the isolated knowledge, the first Pashto isolated digits database. The word recognizer with phoneme based HMM models. In database consists of Pashto digits from zero (sefer) to the training and testing phase of this system, isolated hundred (sul) uttered by sixty speakers, 30 male and 30 digits data sets are taken from the telephony Arabic female. The speakers included are having ages, ranging speech database, SAAVB. This standard database was from 18 to 60 years. The recordings are performed in a developed by KACST and it is classified as a noisy noise-free environment using Sony PCM-M 10 Linear speech database. A hidden Markov model based speech Recorder. The audio is stored in .wav format and recognition system was designed and tested with transferred to the laptop via an usb cable. After editing, the audio is split into individual digits using Adobe Audition automatic Arabic digits recognition. In Urdu language, ver. 1.0. The isolated digit recognition experiments are then corpus development was discussed by Huda Sarfraz [7] performed on a subset of the database containing first 11 while the development of isolated word database is digits and 18 speakers. Mel Frequency Cepstral Coefficients presented in [8]. In [7], development of an Urdu speech (MFCC) were used as the feature vector while Linear database for speaker independent spontaneous speech Discriminent Analysis (LDA) based classifier was used for recognition has been presented containing data from 82 the classification. speakers (42 male and 40 female), recorded over a microphone and a telephone line. The speech was Keywords-Pashto speech recognition; Pashto acoustic collected from speakers ranging from 20 to 55 years of database and Pashto islotated word database. age. Recording sessions were conducted in office and home environments. In Indian languages, speech I. INTRODUCTION databases in Tamil, Telugu and Marathi were elaborated by Anumanchipalli [9]. In this work speech data was The world is moving towards Computerization and collected from about 560 speakers in these three there is an increased interaction of humans with the languages. Preliminary speech recognition results using machines in their daily lives, so there is a strong need to the acoustic models created on Sphinx 2 speech make this human-computer interaction more natural and recognition toolkit, has also been presented. Speech pervasive [1]. To achieve this goal, the development of translation system for American English and Pashto was listening [2], speaking [3] and viewing [4], [5] machine developed by Andreas Kathol [10]. In this paper, issues has became very hot areas of research. The area of encountered by Pashto in the areas of written research, conducting work on listening machine is representation, corpus creation, speech recognition, popularly known as Automatic Speech Recognition speech synthesis, and grammar development for (ASR). Obtaining such capabilities in local languages is a translation have been discussed. next stage of this technological advancement so that everyone can communicate with the computers in a General rules for corpus development are discussed by natural way, and a number of research works on the ASR Sue Atkins [11]. This paper, explains the principal aspects of local languages has been reported in the recent years. of corpus creation and the major decisions to be made Hejazi, in [2] has presented an isolated digits recognition when creating an electronic text corpus, basic standards system in Persian languages. He has used Hidden Markov and features for corpus design in the initial stages of Model (HMM) to decompose a word into small parts for corpus building. solving the problem arising from the pronunciation For the research on natural language processing of any similarity of some Persian digits that are composed of language, the first step is the development of a standard very similar phonetic and spectral components. For language resource, both for training and a basic platform recognition a Support Vector Machine SVM is used to for performance evaluation. In the ASR research, isolated segment the input word and to find an entry with the word and digit recognition tasks are generally the first maximum number of similar segments. In Arabic, an stages posing a relatively simple and well defined tasks automatic recognition of spoken digits was described by and a baseline for onward research [2], [6], [8]. This Yousef Ajami [6]. In this paper, the system was designed paper presents the development of an isolated digits 17 ˺̀ βϟ ϩ΅ϭ΃ Owa- Las database in Pashto language. Initial ASR experiments 18 ˺́ βϟ ϪΗ΃ Ata- Las have been carried out on the database using Mel- 19 ˺̂ βϟ Ϯϧ Noo- Las Frequency Cepstral Coefficients (MFCC) based features 20 ˻˹ Ϟη Shul 21 ˻˺ ΖθϯϮϳ Yaw-Vesht and Linear Discriminant Analysis (LDA) classifier. The 22 ˻˻ Ζθϯϭ ϩ΅Ω Dwa—Vesht έΩ Dray-Veshtـ contents of the database, recording setup and the results 23 ˻˼ Ζθϯϭ of initial experiments on the database are discussed in 24 ˻˽ Ζθϯϭ ήϯϠո Celour-Vesht details in the following sub-sections. 25 ˻˾ Ζθϯϭ ϪթϨ̡ Penza-Vesht Ζθϯϭ̢֛η Shpag-Vesht ˿˻ 26 27 ˻̀ Ζθϯϭ ϩ΅ϭ΃ Owa-Vesht II. PASHTO DIGITS DATABASE 28 ˻́ Ζθϯϭ ϪΗ΃ Ata-Vesht Pashto, the national language of Afghanistan and one 29 ˻̂ Ζθϯϭ ϪϬϧ Naha-Vesht 30 ˼˹ εήϯΩ Dairsh of the major languages of Pakistan, has an estimated 50- 40 ˽˹ Ζ֣ϯϮϠո Celwaikht 60 million speakers all over the world [12]. Pashto is 50 ˾˹ αϮթϨ̡ Panzoos divided into three dialects, Northern Pashto, Central 60 ˿˹ ϪΘϯ̢η Shpeta Pashto and Southern Pashto [13]. Northern Pashto is 70 ̀˹ Ϫϯϭ΍ Away spoken in most of Khyber Pakhtoonkhwa and its 80 ́˹ ϪϯΗ΍ Atya adjoining areas in Afghanistan, the Provinces of Kunar 90 ̂˹ ϯϮϧ Nawi and Nangarhar. It is also most common dialect among the 100 ˺˹˹ Ϟγ Sul Pashtun Diaspora mostly in Canada, India, UAE, UK and The pattern of Pashto digits from zero to hundred is the USA. Its alternate names are Pakhto, Pukhto and shown in Table 1. Yusufzai Pukhto. Pashto central is spoken in Wazirstan, Bannu and Karak region of the Khyber Pakhtoonkhwa In the number system of Pashto, like most other province of Pakistan. Its alternate names are Waciri languages such as Persian [2], Arabic [6] and other (Waziri) and Bannuchi (Bannochi, Bannu). Southern languages [16], each digit from zero (sefer) to ten (las) Pashto is spoken in Quetta area of Balochistan province has different name. The numbers following las, yawo- and also in Afghanistan, Iran, Tajikistan, United Arab las(11) to noo-las(19) contains common post-fix las with Emirates, and United Kingdom. In this paper Northern each digit from yaw(one) to naha(nine) with slight Pashto (Yusufzai dialect) has been used. modification, while from yaw-vesht(21) to naha- vesht(29) the common post-fix is vesht with digits from yaw(1) to naha(9) as a pre-fix. Same pattern is followed A. Database content for all digits till sul(100). There are some variations of The first step in database development is the selection this pattern among different regions/dialects. The of content. A database for the general purpose research numbers included here are those of the Yousafzai dialect. should contain all the sounds in the language called phonemes. III. DATABASE DEVELOPMENT A phoneme is the smallest meaningful unit of sound For the digits database the content is naturally fixed. utterance [14]. Although no standard set of Pashto As Pashto speakers also use other number, such as Urdu phonemes has yet been identified, in [15], 41 phonemes in and English in their daily lives, to make it easy for their Pashto have been reported including 35 consonants and 6 recording, numbers from sefer (zero) to sul (100) were vowels phonemes. For a database for ASR, all phonemes written on a sheet using Liwal Pashto/Dari support of the language need to be included along with software [17]. maintaining a statistical balance between their occurrences. However, for the digit recognition A. Speaker selection application the set of words is fixed i.e. digit which may or may not contain all the phonemes of the language. For the purpose of recording, the speaker who could fluently & accurately pronounce the written digits in TABLE I. PASHTO DIGITS PATTERN Pashto was selected. A page with Pashto digits written on Digits Pashto (in figure) Pashto (words) Pronunciation it was given to each speaker for reading and was asked to 0 ˹ ήϔλ Sefer read. The pronunciation of the speaker was checked 1 ˺Ϯϳ Yaw whether his/her the reading was correct and that the 2 ˻ϩ΅Ω Dwa pronunciation was in the Yousafzai dialect. έΩ Drayـ˼ 3 4 ˽ έϮϠո Celour 5 ˾ ϪթϨ̡ Penza B. Recording environment η Shpeg The recording environment is an important aspect of̢֛ ˿ 6 7 ̀ ϩ΅ϭ΃ Owa the database such as office environment and in car etc. In 8 ́ ϪΗ΃ Ata 9 ̂ ϪϬϧ Naha this database the recording was performed in an office 10 ˺˹ βϟ Las environment either in faculty room or in the library of the 11 ˺˺ βϟ Ϯϳ Yawo- Las University. Further, the signal to noise ratio was 12 ˺˻ βϟ ϩ΅Ω do- Las controlled by adjusting the recorder sensitivity.
Recommended publications
  • Grammatical Gender in Hindukush Languages
    Grammatical gender in Hindukush languages An areal-typological study Julia Lautin Department of Linguistics Independent Project for the Degree of Bachelor 15 HEC General linguistics Bachelor's programme in Linguistics Spring term 2016 Supervisor: Henrik Liljegren Examinator: Bernhard Wälchli Expert reviewer: Emil Perder Project affiliation: “Language contact and relatedness in the Hindukush Region,” a research project supported by the Swedish Research Council (421-2014-631) Grammatical gender in Hindukush languages An areal-typological study Julia Lautin Abstract In the mountainous area of the Greater Hindukush in northern Pakistan, north-western Afghanistan and Kashmir, some fifty languages from six different genera are spoken. The languages are at the same time innovative and archaic, and are of great interest for areal-typological research. This study investigates grammatical gender in a 12-language sample in the area from an areal-typological perspective. The results show some intriguing features, including unexpected loss of gender, languages that have developed a gender system based on the semantic category of animacy, and languages where this animacy distinction is present parallel to the inherited gender system based on a masculine/feminine distinction found in many Indo-Aryan languages. Keywords Grammatical gender, areal-typology, Hindukush, animacy, nominal categories Grammatiskt genus i Hindukush-språk En areal-typologisk studie Julia Lautin Sammanfattning I den här studien undersöks grammatiskt genus i ett antal språk som talas i ett bergsområde beläget i norra Pakistan, nordvästra Afghanistan och Kashmir. I området, här kallat Greater Hindukush, talas omkring 50 olika språk från sex olika språkfamiljer. Det stora antalet språk tillsammans med den otillgängliga terrängen har gjort att språken är arkaiska i vissa hänseenden och innovativa i andra, vilket gör det till ett intressant område för arealtypologisk forskning.
    [Show full text]
  • Some Principles of the Use of Macro-Areas Language Dynamics &A
    Online Appendix for Harald Hammarstr¨om& Mark Donohue (2014) Some Principles of the Use of Macro-Areas Language Dynamics & Change Harald Hammarstr¨om& Mark Donohue The following document lists the languages of the world and their as- signment to the macro-areas described in the main body of the paper as well as the WALS macro-area for languages featured in the WALS 2005 edi- tion. 7160 languages are included, which represent all languages for which we had coordinates available1. Every language is given with its ISO-639-3 code (if it has one) for proper identification. The mapping between WALS languages and ISO-codes was done by using the mapping downloadable from the 2011 online WALS edition2 (because a number of errors in the mapping were corrected for the 2011 edition). 38 WALS languages are not given an ISO-code in the 2011 mapping, 36 of these have been assigned their appropri- ate iso-code based on the sources the WALS lists for the respective language. This was not possible for Tasmanian (WALS-code: tsm) because the WALS mixes data from very different Tasmanian languages and for Kualan (WALS- code: kua) because no source is given. 17 WALS-languages were assigned ISO-codes which have subsequently been retired { these have been assigned their appropriate updated ISO-code. In many cases, a WALS-language is mapped to several ISO-codes. As this has no bearing for the assignment to macro-areas, multiple mappings have been retained. 1There are another couple of hundred languages which are attested but for which our database currently lacks coordinates.
    [Show full text]
  • Pashto, Waneci, Ormuri. Sociolinguistic Survey of Northern
    SOCIOLINGUISTIC SURVEY OF NORTHERN PAKISTAN VOLUME 4 PASHTO, WANECI, ORMURI Sociolinguistic Survey of Northern Pakistan Volume 1 Languages of Kohistan Volume 2 Languages of Northern Areas Volume 3 Hindko and Gujari Volume 4 Pashto, Waneci, Ormuri Volume 5 Languages of Chitral Series Editor Clare F. O’Leary, Ph.D. Sociolinguistic Survey of Northern Pakistan Volume 4 Pashto Waneci Ormuri Daniel G. Hallberg National Institute of Summer Institute Pakistani Studies of Quaid-i-Azam University Linguistics Copyright © 1992 NIPS and SIL Published by National Institute of Pakistan Studies, Quaid-i-Azam University, Islamabad, Pakistan and Summer Institute of Linguistics, West Eurasia Office Horsleys Green, High Wycombe, BUCKS HP14 3XL United Kingdom First published 1992 Reprinted 2004 ISBN 969-8023-14-3 Price, this volume: Rs.300/- Price, 5-volume set: Rs.1500/- To obtain copies of these volumes within Pakistan, contact: National Institute of Pakistan Studies Quaid-i-Azam University, Islamabad, Pakistan Phone: 92-51-2230791 Fax: 92-51-2230960 To obtain copies of these volumes outside of Pakistan, contact: International Academic Bookstore 7500 West Camp Wisdom Road Dallas, TX 75236, USA Phone: 1-972-708-7404 Fax: 1-972-708-7433 Internet: http://www.sil.org Email: [email protected] REFORMATTING FOR REPRINT BY R. CANDLIN. CONTENTS Preface.............................................................................................................vii Maps................................................................................................................
    [Show full text]
  • New Language Resources for the Pashto Language
    New language resources for the Pashto language Djamel Mostefa 1 , Khalid Choukri 1 , Sylvie Brunessaux 2 , Karim Boudahmane 3 1 Evaluation and Language resources Distribution Agency, France 2 CASSIDIAN, France 3 Direction Générale de l'Armement, France E-mail: [email protected], [email protected], [email protected], [email protected] Abstract This paper reports on the development of new language resources for the Pashto language, a very low-resource language spoken in Afghanistan and Pakistan. In the scope of a multilingual data collection project, three large corpora are collected for Pashto. Firstly a monolingual text corpus of 100 million words is produced. Secondly a 100 hours speech database is recorded and manually transcribed. Finally a bilingual Pashto-French parallel corpus of around 2 million is produced by translating Pashto texts into French. These resources will be used to develop Human Language Technology systems for Pashto with a special focus on Machine Translation. Keywords: Pashto, low-resource language, speech corpus, monolingual and multilingual text corpora, web crawling. other one being Dari) and one regional language in 1. Introduction Pakistan. There are very few corpora and Human Language The code assigned to the language by the ISO 639-3 Technology (HLT) services available for Pashto. No standard is [pus]. language resources for Pashto can be found in the According to the Ethnologue.com website, it is spoken by catalogues of LDC1 and ELRA2. around 20 million people and three main dialects are to be Pashto is a very low-resource language. Google doesn't considered: support Pashto in its search engine or translation services.
    [Show full text]
  • A Micro-Typological Study of Pashai Varieties in Afghanistan
    A micro-typological study of Pashai varieties in Afghanistan Vanessa Quasnik Department of Linguistics Independent Project for a Master’s degree 30hp Master’s program (120hp) Spring term 2019 Supervisor: Henrik Liljegren En mikrotypologisk studie av Pashai varieteter i Afghanistan Sammanfattning Hindukushregionen sträcker sig från Afghanistan över Pakistan till norra Indien och hyser de van- ligtvis så kallade dardiska språken. De dardiska språken tillhör de indo-ariska språken vilka i isolation och genom kontakt utvecklade eller bevarade drag som inte längre finns i indo-ariska språk utanför regionen. I det pågående projekt “Språkkontakt och språksläktskap i Hindukushregionen” samlades data från mer än 50 språk inklusive nio varietéer av det nordvästra indo-ariska språket Pashai som talas i västra Afghanistan. En kognatanalys och en analys av fonologiska, morfologiska, syntaktiska och lexikala drag genomfördes. Kognatanalysen visar att Pashai varieteterna formar kluster, en västra grupp av de tre västra varieteterna och en östra grupp av de sex östra varieteterna. Struktruanalysen visar en mer skiftande bild av tre potentiella kluster, en grupp av de två mest västra varieteterna, en nordöstra grupp och en centergrupp bestående av en västra varietet och två sydöstra varieteter. Några drag som anses vara delad av språken i regionen kan också konstateras i alla Pashaivarieteter som en subjekt-objekt-verb följd och postpositioner. Nyckelord Hindu Kush, Indoariska språk, microtypology, Pashai A micro-typological study of Pashai varieties in Afghanistan Abstract The Hindu Kush region stretches from Afghanistan over Pakistan to North India and is hometowhatis commonly known as the Dardic languages. The Dardic langagues are a group of Indo-Aryan languages that have in isolation and under contact developed or retained features that can not be found in Indo- Aryan languages outside the region.
    [Show full text]
  • The Socioeconomics of State Formation in Medieval Afghanistan
    The Socioeconomics of State Formation in Medieval Afghanistan George Fiske Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2012 © 2012 George Fiske All rights reserved ABSTRACT The Socioeconomics of State Formation in Medieval Afghanistan George Fiske This study examines the socioeconomics of state formation in medieval Afghanistan in historical and historiographic terms. It outlines the thousand year history of Ghaznavid historiography by treating primary and secondary sources as a continuum of perspectives, demonstrating the persistent problems of dynastic and political thinking across periods and cultures. It conceptualizes the geography of Ghaznavid origins by framing their rise within specific landscapes and histories of state formation, favoring time over space as much as possible and reintegrating their experience with the general histories of Iran, Central Asia, and India. Once the grand narrative is illustrated, the scope narrows to the dual process of monetization and urbanization in Samanid territory in order to approach Ghaznavid obstacles to state formation. The socioeconomic narrative then shifts to political and military specifics to demythologize the rise of the Ghaznavids in terms of the framing contexts described in the previous chapters. Finally, the study specifies the exact combination of culture and history which the Ghaznavids exemplified to show their particular and universal character and suggest future paths for research. The Socioeconomics of State Formation in Medieval Afghanistan I. General Introduction II. Perspectives on the Ghaznavid Age History of the literature Entrance into western European discourse Reevaluations of the last century Historiographic rethinking Synopsis III.
    [Show full text]
  • Map by Steve Huffman; Data from World Language Mapping System
    Svalbard Greenland Jan Mayen Norwegian Norwegian Icelandic Iceland Finland Norway Swedish Sweden Swedish Faroese FaroeseFaroese Faroese Faroese Norwegian Russia Swedish Swedish Swedish Estonia Scottish Gaelic Russian Scottish Gaelic Scottish Gaelic Latvia Latvian Scots Denmark Scottish Gaelic Danish Scottish Gaelic Scottish Gaelic Danish Danish Lithuania Lithuanian Standard German Swedish Irish Gaelic Northern Frisian English Danish Isle of Man Northern FrisianNorthern Frisian Irish Gaelic English United Kingdom Kashubian Irish Gaelic English Belarusan Irish Gaelic Belarus Welsh English Western FrisianGronings Ireland DrentsEastern Frisian Dutch Sallands Irish Gaelic VeluwsTwents Poland Polish Irish Gaelic Welsh Achterhoeks Irish Gaelic Zeeuws Dutch Upper Sorbian Russian Zeeuws Netherlands Vlaams Upper Sorbian Vlaams Dutch Germany Standard German Vlaams Limburgish Limburgish PicardBelgium Standard German Standard German WalloonFrench Standard German Picard Picard Polish FrenchLuxembourgeois Russian French Czech Republic Czech Ukrainian Polish French Luxembourgeois Polish Polish Luxembourgeois Polish Ukrainian French Rusyn Ukraine Swiss German Czech Slovakia Slovak Ukrainian Slovak Rusyn Breton Croatian Romanian Carpathian Romani Kazakhstan Balkan Romani Ukrainian Croatian Moldova Standard German Hungary Switzerland Standard German Romanian Austria Greek Swiss GermanWalser CroatianStandard German Mongolia RomanschWalser Standard German Bulgarian Russian France French Slovene Bulgarian Russian French LombardRomansch Ladin Slovene Standard
    [Show full text]
  • Map by Steve Huffman Data from World Language Mapping System 16
    Tajiki Tajiki Tajiki Shughni Southern Pashto Shughni Tajiki Wakhi Wakhi Wakhi Mandarin Chinese Sanglechi-Ishkashimi Sanglechi-Ishkashimi Wakhi Domaaki Sanglechi-Ishkashimi Khowar Khowar Khowar Kati Yidgha Eastern Farsi Munji Kalasha Kati KatiKati Phalura Kalami Indus Kohistani Shina Kati Prasuni Kamviri Dameli Kalami Languages of the Gawar-Bati To rw al i Chilisso Waigali Gawar-Bati Ushojo Kohistani Shina Balti Parachi Ashkun Tregami Gowro Northwest Pashayi Southwest Pashayi Grangali Bateri Ladakhi Northeast Pashayi Southeast Pashayi Shina Purik Shina Brokskat Aimaq Parya Northern Hindko Kashmiri Northern Pashto Purik Hazaragi Ladakhi Indian Subcontinent Changthang Ormuri Gujari Kashmiri Pahari-Potwari Gujari Bhadrawahi Zangskari Southern Hindko Kashmiri Ladakhi Pangwali Churahi Dogri Pattani Gahri Ormuri Chambeali Tinani Bhattiyali Gaddi Kanashi Tinani Southern Pashto Ladakhi Central Pashto Khams Tibetan Kullu Pahari KinnauriBhoti Kinnauri Sunam Majhi Western Panjabi Mandeali Jangshung Tukpa Bilaspuri Chitkuli Kinnauri Mahasu Pahari Eastern Panjabi Panang Jaunsari Western Balochi Southern Pashto Garhwali Khetrani Hazaragi Humla Rawat Central Tibetan Waneci Rawat Brahui Seraiki DarmiyaByangsi ChaudangsiDarmiya Western Balochi Kumaoni Chaudangsi Mugom Dehwari Bagri Nepali Dolpo Haryanvi Jumli Urdu Buksa Lowa Raute Eastern Balochi Tichurong Seke Sholaga Kaike Raji Rana Tharu Sonha Nar Phu ChantyalThakali Seraiki Raji Western Parbate Kham Manangba Tibetan Kathoriya Tharu Tibetan Eastern Parbate Kham Nubri Marwari Ts um Gamale Kham Eastern
    [Show full text]
  • In Yohanan Friedmann (Ed.), Islam in Asia, Vol. 1 (Jerusalem: Magnes Press, 1984), P
    Notes INTRODUCTION: AFGHANISTAN’S ISLAM 1. Cited in C. Edmund Bosworth, “The Coming of Islam to Afghanistan,” in Yohanan Friedmann (ed.), Islam in Asia, vol. 1 (Jerusalem: Magnes Press, 1984), p. 13. 2. Erica C. D. Hunter, “The Church of the East in Central Asia,” Bulletin of the John Rylands University Library of Manchester 78 (1996), pp. 129–42. On Herat, see pp. 131–34. 3. On Afghanistan’s Jews, see the discussion and sources later in this chapter and notes 163 to 169. 4. Bosworth (1984; above, note 1), pp. 1–22; idem, “The Appearance and Establishment of Islam in Afghanistan,” in Étienne de la Vaissière (ed.), Islamisation de l’Asie Centrale: Processus locaux d’acculturation du VIIe au XIe siècle, Cahiers de Studia Iranica 39 (Paris: Association pour l’Avancement des Études Iraniennes, 2008); and Gianroberto Scarcia, “Sull’ultima ‘islamizzazione’ di Bāmiyān,” Annali dell’Istituto Universitario Orientale di Napoli, new series, 16 (1966), pp. 279–81. On the early Arabic sources on Balkh, see Paul Schwarz, “Bemerkungen zu den arabischen Nachrichten über Balkh,” in Jal Dastur Cursetji Pavry (ed.), Oriental Studies in Honour of Cursetji Erachji Pavry (London: Oxford Univer- sity Press, 1933). 5. Hugh Kennedy and Arezou Azad, “The Coming of Islam to Balkh,” in Marie Legen- dre, Alain Delattre, and Petra Sijpesteijn (eds.), Authority and Control in the Countryside: Late Antiquity and Early Islam (London: Darwin Press, forthcoming). 6. For example, Geoffrey Khan (ed.), Arabic Documents from Early Islamic Khurasan (London: Nour Foundation/Azimuth Editions, 2007). 7. Richard W. Bulliet, Conversion to Islam in the Medieval Period: An Essay in Quan- titative History (Cambridge, Mass.: Harvard University Press, 1979); Derryl Maclean, Re- ligion and Society in Arab Sind (Leiden: Brill, 1989); idem, “Ismailism, Conversion, and Syncretism in Arab Sind,” Bulletin of the Henry Martyn Institute of Islamic Studies 11 (1992), pp.
    [Show full text]
  • Demonstrative Contrasts in Hindukush Indo-Aryan
    Demonstrative contrasts in Hindukush Indo-Aryan Noa Lange Department of Linguistics Independent Project for the Degree of Master (One Year) 15 HEC General linguistics Master’s programme in language sciences – specialization in typology and linguistic diversity Spring term 2016 Supervisor: Henrik Liljegren Examiner: Bernhard Wälchli Expert reviewer: Emil Perder Project affiliation: “Language contact and relatedness in the Hindukush Region,” a research project supported by the Swedish Research Council (421-2014-631) Demonstrative contrasts in Hindukush Indo-Aryan Noa Lange Abstract Hindukush Indo-Aryan (HKIA) is a disputed subgroup of Indo-Aryan languages spoken within a linguistically diverse area stretching from northeastern Afghanistan, across northern Pakistan to northwestern India, principally covering the mountainous region of Hindukush–Karakoram–Western Himalaya. A noteworthy feature of some of these languages is a three-way demonstrative system, or three deictic terms used by speakers to direct one another’s attention to referents at different distances in their environment. It has been suggested that the distinguishing feature of one such demonstrative in HKIA is its referents’ remote distance from the interlocutors, or their absence from the environment altogether. The purpose of this study is two-fold: firstly, it is to more closely examine the demonstrative systems of a sample of HKIA languages on the basis of fieldwork data; secondly, it is to compare the data to previous accounts of the systems, as well as to the demonstrative systems of other languages spoken in the region. The results provide evidence that two demonstratives in HKIA are distance-contrastive, and one is invisibility-contrastive vis-à-vis the other two.
    [Show full text]
  • Akbar Ali* Muhammad Ilyas**
    PAKISTAN–Bi-annual Research Journal Vol. No 56, January- June 2020 PHONOLOGICAL VARIATIONS IN KHYBER PAKHTUNKHWA: A STUDY OF YOUSAFZAI AND AFRIDI DIALECTS OF PASHTO Akbar Ali* Muhammad Ilyas** Abstract The paper probes into the phonological variations between Yousafzai and Afridi dialects of Pashto of District Mardan and Khyber Agency respectively. The major focus was the study of vowel differences in both the dialects of Pashto language. For the current study, data was collected through audio recordings from100 native speakers of Yousafzai and Afridi dialects through a stimuli list by using convenience sampling, in a completely natural setting. After the analysis of the data, it was found that long vowels are used only by speakers of Yousafzai dialect while Afridis do not use long vowels. However, short vowels are used by speakers of both the dialects. Moreover, both of the Pashto dialects speakers pronounce initial vowels apparently alike. An obvious difference is in /a/ and /o/ sounds. Afridis pronounce /o/ instead of /a/ at middle and final positions. Key Words Pashto, Afridi, Yousafzai, Dialect, Phonological Variations, Vowel sounds. 1. Introduction There has been a remarkable growth in the study of language variation in the last fifty years and it has now become an important field of research in sociolinguistics. Language varies from place to place and every community of a different place has its own type of language (Shohamy, 2006). This variation of languages allows us to * Assistant Professor Department of English, FATA University KP [email protected] ** PhD Scholar, Qurtuba University of Science & IT, Hayatabad, Peshawar , [email protected] 145 PAKISTAN–Bi-annual Research Journal Vol.
    [Show full text]
  • Placing Wardak Among Pashto Varieties Dennis Walter Coyle
    University of North Dakota UND Scholarly Commons Theses and Dissertations Theses, Dissertations, and Senior Projects January 2014 Placing Wardak Among Pashto Varieties Dennis Walter Coyle Follow this and additional works at: https://commons.und.edu/theses Recommended Citation Coyle, Dennis Walter, "Placing Wardak Among Pashto Varieties" (2014). Theses and Dissertations. 1635. https://commons.und.edu/theses/1635 This Thesis is brought to you for free and open access by the Theses, Dissertations, and Senior Projects at UND Scholarly Commons. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of UND Scholarly Commons. For more information, please contact [email protected]. PLACING WARDAK AMONG PASHTO VARIETIES by Dennis Walter Coyle Bachelor of Science, University of Arizona, 1984 A Thesis Submitted to the Graduate Faculty of the University of North Dakota in partial fulfillment of the requirements for the degree of Master of Arts Grand Forks, North Dakota August 2014 © 2014 Dennis Walter Coyle ii PERMISSION Title Placing Wardak among Pashto Varieties Department Linguistics Degree Master of Arts In presenting this thesis in partial fulfillment of the requirements for a graduate degree from the University of North Dakota, I agree that the library of this University shall make it freely available for inspection. I further agree that permission for extensive copying for scholarly purposes may be granted by the professor who supervised my thesis work or, in his absence, by the chairperson of the department or the dean of the School of Graduate Studies. It is understood that any copying or publication or other use of this thesis or part thereof for financial gain shall not be allowed without my written permission.
    [Show full text]