(And Potential) Language and Linguistic Resources on South Asian Languages
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Named Entity Recognition System for Kashmiri Language Iamir Bashir Malik, Iikhushboo Bansal Istudent, M.Tech, Iiassistant Professor I,Iidept
ISSN : 2347 - 8446 (Online) International Journal of Advanced Research in ISSN : 2347 - 9817 (Print) Vol. 3, Issue 2 (Apr. - Jun. 2015) Computer Science & Technology (IJARCST 2015) Named Entity Recognition System for Kashmiri Language IAmir Bashir Malik, IIKhushboo Bansal IStudent, M.Tech, IIAssistant Professor I,IIDept. of CSE, Desh Bhagat University, Mandi Gobindgarh, Punjab, India Abstract Named Entity Recognition (NER) is a task which helps in finding out Persons name, Location names, Organization names, Place, Date, Time etc. and classifies them into predefined different categories. Named Entity Recognition plays a major role in various Natural Language Processing (NLP) fields like Information Extraction, Machine Translations and Question Answering. Unfortunately Kashmiri language which is a scarce resourced language has not been taken into account. This paper describes the problems of NER in the context of Kashmiri Language and provides relevant solutions. Keywords Named Entity, Named Entity Recognition, Natural language process, Kashmiri language text. I. Introduction is as follows. The term Named Entity (NE) was evolved during the sixth (1) “Micromax”represent anorganization and “ Dec19, 2014” Message Understanding Conference (MUC -6, 1995).Named represent dateand “smartphone” represent entity and “had Entity Recognition (NER) is also knows as entity identification is a launched its on” represent others. subtask of information extraction (IE). NER extracts and classifies The named entities may be of any type such as given below -
BALOCH WOMEN in LITERATURE Muhammad Panah Baloch1
BALOCH WOMEN IN LITERATURE Muhammad Panah Baloch1 Abstract Women play a very important role in human advancement and have a momentous place in the society. They are not at all poorer to men. They are capable of sharing all the everyday jobs of life. Man and woman have been rightly compared to the wheels of the same carriage. Women in Baloch society has been greatly overseen in the Baloch history but now is coming to a more standpoint to people. Milieu of Baloch realm Origin and history of Baloch is still not cleared by the historians till today and needs removal of dust from the narrations of history. Many of historian, travelers and frontier officers of late eighteen century have different opinion and perception about their origin and history. Potinger and Khanikoff advocates them Turkmen origin, Sir. B. Burton, Lassen, Spiegal and others favoured them as Iranian origin, Dr. Bellew put forward them Rajput origin and Sir. T. Holdich and Colonel E. Meckler traces them Arab origin. The Excavation of Mehrgarh, Killi Gul Muhammad, Pir Syed Balo, Kechi Baig, Sampur, Meeri Kalat, Nighar Damb, Naushehra, Pirak, Sia Damb, Sped Bullandi, Damb Behman and many other archaeological sites of Balochistan and Seistan-o-Balochistan explored many types of objects giving many details. The Social, political, fiscal, religious, cultural and anthropological information of these mounds and ruins explain the pre-historic Balochistan and provide evidence that, the area of Balochistan was the homeland of early settlement of humankind. Latest research work showing that, the Baloch have 1Assistant Director, Arid Zone Research Centre, Quetta thousands years presence of in the different regions of Balochistan (Pakistan, Iran and Afghanistan and other adjoining areas). -
Grammatical Gender in Hindukush Languages
Grammatical gender in Hindukush languages An areal-typological study Julia Lautin Department of Linguistics Independent Project for the Degree of Bachelor 15 HEC General linguistics Bachelor's programme in Linguistics Spring term 2016 Supervisor: Henrik Liljegren Examinator: Bernhard Wälchli Expert reviewer: Emil Perder Project affiliation: “Language contact and relatedness in the Hindukush Region,” a research project supported by the Swedish Research Council (421-2014-631) Grammatical gender in Hindukush languages An areal-typological study Julia Lautin Abstract In the mountainous area of the Greater Hindukush in northern Pakistan, north-western Afghanistan and Kashmir, some fifty languages from six different genera are spoken. The languages are at the same time innovative and archaic, and are of great interest for areal-typological research. This study investigates grammatical gender in a 12-language sample in the area from an areal-typological perspective. The results show some intriguing features, including unexpected loss of gender, languages that have developed a gender system based on the semantic category of animacy, and languages where this animacy distinction is present parallel to the inherited gender system based on a masculine/feminine distinction found in many Indo-Aryan languages. Keywords Grammatical gender, areal-typology, Hindukush, animacy, nominal categories Grammatiskt genus i Hindukush-språk En areal-typologisk studie Julia Lautin Sammanfattning I den här studien undersöks grammatiskt genus i ett antal språk som talas i ett bergsområde beläget i norra Pakistan, nordvästra Afghanistan och Kashmir. I området, här kallat Greater Hindukush, talas omkring 50 olika språk från sex olika språkfamiljer. Det stora antalet språk tillsammans med den otillgängliga terrängen har gjort att språken är arkaiska i vissa hänseenden och innovativa i andra, vilket gör det till ett intressant område för arealtypologisk forskning. -
Punjabi Language Characteristics and Role of Thesaurus in Natural
Dharam Veer Sharma et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (4) , 2011, 1434-1437 Punjabi Language Characteristics and Role of Thesaurus in Natural Language processing Dharam Veer Sharma1 Aarti2 Department of Computer Science, Punjabi University, Patiala, INDIA Abstract---This paper describes an attempt to explain various 2.2 Characteristics of the Punjabi Language characteristics of Punjabi language. The origin and symbols of Modern Punjabi is a very tonal language, making use of Punjabi language are presents in this paper. Various relations various tones to differentiate words that would otherwise be exist in thesaurus and role of thesaurus in natural language identical. Three primary tones can be identified: high-rising- processing also has been elaborated in this paper. falling, mid-rising-falling, and low rising. Following are characteristics of Punjabi language [3] [4]. Keywords---Thesaurus, Punjabi, characteristics, relations 2.2.1 Morphological characteristics Morphologically, Punjabi is an agglutinative language. That 1. INTRODUCTION is to say, grammatical information is encoded by way of A thesaurus links semantically related words and helps in the affixation (largely suffixation), rather than via independent selection of most appropriate words for given contexts [1]. A freestanding morphemes. Punjabi nouns inflect for number thesaurus contains synonyms (words which have basically the (singular, plural), gender (masculine, feminine), and same meaning) and as such is an important tool for many declension class (absolute, oblique). The absolute form of a applications in NLP too. The purpose is twofold: For writers, noun is its default or uninflected form. This form is used as it is a tool - one with words grouped and classified to help the object of the verb, typically when inanimate, as well as in select the best word to convey a specific nuance of meaning, measure or temporal (point of time) constructions. -
The Meeting Place. Radio Features in the Shina Language of Gilgit
Georg Buddruss and Almuth Degener The Meeting Place Radio Features in the Shina Language of Gilgit by Mohammad Amin Zia Text, interlinear Analysis and English Translation with a Glossary 2012 Harrassowitz Verlag · Wiesbaden ISSN 1432-6949 ISBN 978-3-447-06673-0 Contents Preface....................................................................................................................... VII bayáak 1: Giving Presents to Our Friends................................................................... 1 bayáak 2: Who Will Do the Job? ............................................................................... 47 bayáak 3: Wasting Time............................................................................................. 85 bayáak 4: Cleanliness................................................................................................. 117 bayáak 5: Sweet Water............................................................................................... 151 bayáak 6: Being Truly Human.................................................................................... 187 bayáak 7: International Year of Youth........................................................................ 225 References.................................................................................................................. 263 Glossary..................................................................................................................... 265 Preface Shina is an Indo-Aryan language of the Dardic group which is spoken in several -
Pashto, Waneci, Ormuri. Sociolinguistic Survey of Northern
SOCIOLINGUISTIC SURVEY OF NORTHERN PAKISTAN VOLUME 4 PASHTO, WANECI, ORMURI Sociolinguistic Survey of Northern Pakistan Volume 1 Languages of Kohistan Volume 2 Languages of Northern Areas Volume 3 Hindko and Gujari Volume 4 Pashto, Waneci, Ormuri Volume 5 Languages of Chitral Series Editor Clare F. O’Leary, Ph.D. Sociolinguistic Survey of Northern Pakistan Volume 4 Pashto Waneci Ormuri Daniel G. Hallberg National Institute of Summer Institute Pakistani Studies of Quaid-i-Azam University Linguistics Copyright © 1992 NIPS and SIL Published by National Institute of Pakistan Studies, Quaid-i-Azam University, Islamabad, Pakistan and Summer Institute of Linguistics, West Eurasia Office Horsleys Green, High Wycombe, BUCKS HP14 3XL United Kingdom First published 1992 Reprinted 2004 ISBN 969-8023-14-3 Price, this volume: Rs.300/- Price, 5-volume set: Rs.1500/- To obtain copies of these volumes within Pakistan, contact: National Institute of Pakistan Studies Quaid-i-Azam University, Islamabad, Pakistan Phone: 92-51-2230791 Fax: 92-51-2230960 To obtain copies of these volumes outside of Pakistan, contact: International Academic Bookstore 7500 West Camp Wisdom Road Dallas, TX 75236, USA Phone: 1-972-708-7404 Fax: 1-972-708-7433 Internet: http://www.sil.org Email: [email protected] REFORMATTING FOR REPRINT BY R. CANDLIN. CONTENTS Preface.............................................................................................................vii Maps................................................................................................................ -
Feasibility Study of Introducing Pashto Language As a Medium of Instruction in the Government Primary Schools of Khyber
FEASIBILITY STUDY OF INTRODUCING PASHTO LANGUAGE AS A MEDIUM OF INSTRUCTION IN GOVERNMENT PRIMARY SCHOOLS OF KHYBER PAKHTUNKHWA BY ABDUL BASIT SIDDIQUI Registration No. 091- NUN - 0056 Submitted in partial fulfillment of the requirements of the degree of Doctor of Philosophy In Education DEPARTMENT OF EDUCATION FACULTY OF ARTS AND SOCIAL SCIENCES NORTHERN UNIVERSITY, NOWSHERA (PAKISTAN) 2014 i ii DEDICATION To my dear parents, whose continuous support, encouragement and persistent prayers have been the real source of my all achievements. iii TABLE OF CONTENTS ACKNOWLEDGEMENT xv ABSTRACT xvii Chapter 1: INTRODUCTION 1 1.1 STATEMENT OF THE PROBLEM 2 1.2 OBJECTIVES OF THE STUDY 3 1.3 HYPOTHESIS OF THE STUDY 3 1.4 SIGNIFICANCE OF THE STUDY 3 1.5 DELIMITATION OF THE STUDY 4 1.6 METHOD AND PROCEDURE 4 1.6.1 Population 4 1.6.2 Sample 4 1.6.3 Research Instruments 5 1.6.4 Data Collection 5 1.6.5 Analysis of Data 5 Chapter 2: REVIEW OF RELATED LITERATURE 6 2.1 ALL CREATURES OF THE UNIVERSE COMMUNICATE 7 2.2 LANGUAGE ESTABLISHES THE SUPERIORITY OF HUMAN BEINGS OVER OTHER SPECIES OF THE WORLD 8 2.3 DEFINITIONS: 9 2.3.1 Mother Tongue / First Language 9 2.3.2 Second Language (L2) 9 2.3.3 Foreign Language 10 2.3.4 Medium of Instruction 10 iv 2.3.5 Mother Tongue as a Medium of Instruction 10 2.4 HOW CHILDREN LEARN THEIR MOTHER TONGUE 10 2.5 IMPORTANT CHARACTERISTICS FOR A LANGUAGE ADOPTED AS MEDIUM OF INSTRUCTION 11 2.6 CONDITIONS FOR THE SELECTION OF DESIRABLE TEXT FOR LANGUAGE 11 2.7 THEORIES ABOUT LEARNING (MOTHER) LANGUAGE 12 2.8 ORIGIN OF PAKHTUN -
New Language Resources for the Pashto Language
New language resources for the Pashto language Djamel Mostefa 1 , Khalid Choukri 1 , Sylvie Brunessaux 2 , Karim Boudahmane 3 1 Evaluation and Language resources Distribution Agency, France 2 CASSIDIAN, France 3 Direction Générale de l'Armement, France E-mail: [email protected], [email protected], [email protected], [email protected] Abstract This paper reports on the development of new language resources for the Pashto language, a very low-resource language spoken in Afghanistan and Pakistan. In the scope of a multilingual data collection project, three large corpora are collected for Pashto. Firstly a monolingual text corpus of 100 million words is produced. Secondly a 100 hours speech database is recorded and manually transcribed. Finally a bilingual Pashto-French parallel corpus of around 2 million is produced by translating Pashto texts into French. These resources will be used to develop Human Language Technology systems for Pashto with a special focus on Machine Translation. Keywords: Pashto, low-resource language, speech corpus, monolingual and multilingual text corpora, web crawling. other one being Dari) and one regional language in 1. Introduction Pakistan. There are very few corpora and Human Language The code assigned to the language by the ISO 639-3 Technology (HLT) services available for Pashto. No standard is [pus]. language resources for Pashto can be found in the According to the Ethnologue.com website, it is spoken by catalogues of LDC1 and ELRA2. around 20 million people and three main dialects are to be Pashto is a very low-resource language. Google doesn't considered: support Pashto in its search engine or translation services. -
Languages of Kohistan. Sociolinguistic Survey of Northern
SOCIOLINGUISTIC SURVEY OF NORTHERN PAKISTAN VOLUME 1 LANGUAGES OF KOHISTAN Sociolinguistic Survey of Northern Pakistan Volume 1 Languages of Kohistan Volume 2 Languages of Northern Areas Volume 3 Hindko and Gujari Volume 4 Pashto, Waneci, Ormuri Volume 5 Languages of Chitral Series Editor Clare F. O’Leary, Ph.D. Sociolinguistic Survey of Northern Pakistan Volume 1 Languages of Kohistan Calvin R. Rensch Sandra J. Decker Daniel G. Hallberg National Institute of Summer Institute Pakistani Studies of Quaid-i-Azam University Linguistics Copyright © 1992 NIPS and SIL Published by National Institute of Pakistan Studies, Quaid-i-Azam University, Islamabad, Pakistan and Summer Institute of Linguistics, West Eurasia Office Horsleys Green, High Wycombe, BUCKS HP14 3XL United Kingdom First published 1992 Reprinted 2002 ISBN 969-8023-11-9 Price, this volume: Rs.300/- Price, 5-volume set: Rs.1500/- To obtain copies of these volumes within Pakistan, contact: National Institute of Pakistan Studies Quaid-i-Azam University, Islamabad, Pakistan Phone: 92-51-2230791 Fax: 92-51-2230960 To obtain copies of these volumes outside of Pakistan, contact: International Academic Bookstore 7500 West Camp Wisdom Road Dallas, TX 75236, USA Phone: 1-972-708-7404 Fax: 1-972-708-7433 Internet: http://www.sil.org Email: [email protected] REFORMATTING FOR REPRINT BY R. CANDLIN. CONTENTS Preface............................................................................................................viii Maps................................................................................................................. -
Finite State Morphology and Sindhi Noun Inflections
PACLIC 24 Proceedings 669 Finite State Morphology and Sindhi Noun Inflections Mutee U Rahman, Mohammad Iqbal Bhatti Department of Computer Science, Isra University, Hala Road, Hyderabad Sindh 71000, Pakistan [email protected], [email protected] Abstract. Sindhi is a morphologically rich language. Morphological construction include inflections and derivations. Sindhi morphology becomes more complex due to primary and secondary word types which are further divided into simple, complex and compound words. Sindhi nouns are marked by number gender and case. Finite state transducers (FSTs) quite reasonably represent the inflectional morphology of Sindhi nouns. The paper investigates Sindhi noun inflection rules and defines equivalent computational rules to be used by FSTs; corresponding FSTs are also given. Keywords. Sindhi, morphology, noun inflections, two-level morphology, finite state morphology. 1 Introduction Morphology deals with word formation rules in a language. Word structures of a language are defined by its morphological constructions. Morphology defines that how smaller meaning bearing units called morphemes are combined to make larger meaning bearing units of a language called words. Morphology also deals with word formation by variations in already existing words. The morphological changes are mostly done by suffix addition, subtraction and replacement phenomenon. In few words morphology can be defined as syntax of word formation. Models for computational analysis of morphology always remained challenge for computational linguists until early 1980’s when 4Ks* discovered the two level morphology (Kaplan, R. M. and M. Kay. 1981) the first general model for morphologically complex languages. This two level morphology represents a word at lexical level and surface level. Morphotactics or morpheme ordering model is used in between these two levels to incorporate morphological changes. -
Language Documentation and Description
Language Documentation and Description ISSN 1740-6234 ___________________________________________ This article appears in: Language Documentation and Description, vol 17. Editor: Peter K. Austin Countering the challenges of globalization faced by endangered languages of North Pakistan ZUBAIR TORWALI Cite this article: Torwali, Zubair. 2020. Countering the challenges of globalization faced by endangered languages of North Pakistan. In Peter K. Austin (ed.) Language Documentation and Description 17, 44- 65. London: EL Publishing. Link to this article: http://www.elpublishing.org/PID/181 This electronic version first published: July 2020 __________________________________________________ This article is published under a Creative Commons License CC-BY-NC (Attribution-NonCommercial). The licence permits users to use, reproduce, disseminate or display the article provided that the author is attributed as the original creator and that the reuse is restricted to non-commercial purposes i.e. research or educational use. See http://creativecommons.org/licenses/by-nc/4.0/ ______________________________________________________ EL Publishing For more EL Publishing articles and services: Website: http://www.elpublishing.org Submissions: http://www.elpublishing.org/submissions Countering the challenges of globalization faced by endangered languages of North Pakistan Zubair Torwali Independent Researcher Summary Indigenous communities living in the mountainous terrain and valleys of the region of Gilgit-Baltistan and upper Khyber Pakhtunkhwa, northern -
UC Santa Barbara Himalayan Linguistics
UC Santa Barbara Himalayan Linguistics Title Traces of mirativity in Shina Permalink https://escholarship.org/uc/item/3bk2d2s6 Journal Himalayan Linguistics, 9(2) Author Bashir, Elena Publication Date 2010 DOI 10.5070/H99223478 License https://creativecommons.org/licenses/by-nc-nd/4.0/ 4.0 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Himalayan Linguistics Traces of mirativity in Shina Elena Bashir The University of Chicago ABSTRACT This paper explores the question of whether mirative meaning, in the sense of Aksu-Koç and Slobin’s (1986) “unprepared mind” and DeLancey (1997), is grammatically encoded in the Dardic language Shina. Mirativity marking in Shina’s linguistic neighbors is examined and compared with the situation in Shina. I find a clustering of what appears to be morphologically indicated mirativity in the eastern dialects of Shina, and identify two morphological strategies which can be employed to encode mirativity. KEYWORDS mirativity, Shina, Dardic languages, Nuristani languages, evidentiality, Kalasha, Khowar, Balti, Tajik Persian This is a contribution from Himalayan Linguistics, Vol. 9(2): 1-55. ISSN 1544-7502 © 2010. All rights reserved. This Portable Document Format (PDF) file may not be altered in any way. Tables of contents, abstracts, and submission guidelines are available at www.linguistics.ucsb.edu/HimalayanLinguistics Himalayan Linguistics, Vol. 9(2). © Himalayan Linguistics 2010 ISSN 1544-7502 Traces of mirativity in Shina∗ Elena Bashir The University of Chicago 0 Introduction In this paper, I explore the question of whether mirative meaning is grammatically expressed in Shina, a Northwest Indo-Aryan (“Dardic”1) language spoken in northeastern Pakistan and some adjacent areas of northwestern India.