Automatic Identification of Closely-Related Indian Languages : Resources and Experiments Ritesh Kumar1, Bornini Lahiri2, Deepak Alok3, Atul Kr
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Honorificity, Indexicality and Their Interaction in Magahi
SPEAKER AND ADDRESSEE IN NATURAL LANGUAGE: HONORIFICITY, INDEXICALITY AND THEIR INTERACTION IN MAGAHI BY DEEPAK ALOK A dissertation submitted to the School of Graduate Studies Rutgers, The State University of New Jersey In partial fulfillment of the requirements For the degree of Doctor of Philosophy Graduate Program in Linguistics Written under the direction of Mark Baker and Veneeta Dayal and approved by New Brunswick, New Jersey October, 2020 ABSTRACT OF THE DISSERTATION Speaker and Addressee in Natural Language: Honorificity, Indexicality and their Interaction in Magahi By Deepak Alok Dissertation Director: Mark Baker and Veneeta Dayal Natural language uses first and second person pronouns to refer to the speaker and addressee. This dissertation takes as its starting point the view that speaker and addressee are also implicated in sentences that do not have such pronouns (Speas and Tenny 2003). It investigates two linguistic phenomena: honorification and indexical shift, and the interactions between them, andshow that these discourse participants have an important role to play. The investigation is based on Magahi, an Eastern Indo-Aryan language spoken mainly in the state of Bihar (India), where these phenomena manifest themselves in ways not previously attested in the literature. The phenomena are analyzed based on the native speaker judgements of the author along with judgements of one more native speaker, and sometimes with others as the occasion has presented itself. Magahi shows a rich honorification system (the encoding of “social status” in grammar) along several interrelated dimensions. Not only 2nd person pronouns but 3rd person pronouns also morphologically mark the honorificity of the referent with respect to the speaker. -
Minority Languages in India
Thomas Benedikter Minority Languages in India An appraisal of the linguistic rights of minorities in India ---------------------------- EURASIA-Net Europe-South Asia Exchange on Supranational (Regional) Policies and Instruments for the Promotion of Human Rights and the Management of Minority Issues 2 Linguistic minorities in India An appraisal of the linguistic rights of minorities in India Bozen/Bolzano, March 2013 This study was originally written for the European Academy of Bolzano/Bozen (EURAC), Institute for Minority Rights, in the frame of the project Europe-South Asia Exchange on Supranational (Regional) Policies and Instruments for the Promotion of Human Rights and the Management of Minority Issues (EURASIA-Net). The publication is based on extensive research in eight Indian States, with the support of the European Academy of Bozen/Bolzano and the Mahanirban Calcutta Research Group, Kolkata. EURASIA-Net Partners Accademia Europea Bolzano/Europäische Akademie Bozen (EURAC) – Bolzano/Bozen (Italy) Brunel University – West London (UK) Johann Wolfgang Goethe-Universität – Frankfurt am Main (Germany) Mahanirban Calcutta Research Group (India) South Asian Forum for Human Rights (Nepal) Democratic Commission of Human Development (Pakistan), and University of Dhaka (Bangladesh) Edited by © Thomas Benedikter 2013 Rights and permissions Copying and/or transmitting parts of this work without prior permission, may be a violation of applicable law. The publishers encourage dissemination of this publication and would be happy to grant permission. -
The British Learning of Hindustani
The British Learning of Hindustani Tariq Rahman The British considered Hindustani, an urban language of north India, the lingua franca of the whole country. They associated it with (easy) Urdu and not modern, or Sanskritized, Hindi. They learned it to exercise power and, because of that, were not careful of mastering the polite usages of the language or its grammar. The British perceptions of the language spread it widely throughout India, especially the urban areas, making it much more widespread than it was when they had arrived. Moreover, their tilt towards Urdu associated it with the Muslims and the language was officially discarded in favour of Hindi in India after independence. The literature of Anglo-India (used here in the earlier sense for the British in India and not for those born of marriages between Europeans and Indians as it came to be understood later) is full of words from Hindustani (also called Urdu by some authors). Rudyard Kipling (1865-1936) can hardly be enjoyed unless one is provided with a glossary of these words and even then the authenticity of the experience of the raj is lost in the translation and the interpretation. Many of those who had been in India used words of Hindustani as an identity marker. According to Ivor Lewis, author of a dictionary of Anglo-Indian words: They [the Hindustani words] could not have been much used except (with fading relevance) among a declining number of 20 Pakistan Vision Vol. 8 No. 2 retired Anglo-Indians in the evening of their lives spent in their salubrious English compounds and cantonments. -
Mapping India's Language and Mother Tongue Diversity and Its
Mapping India’s Language and Mother Tongue Diversity and its Exclusion in the Indian Census Dr. Shivakumar Jolad1 and Aayush Agarwal2 1FLAME University, Lavale, Pune, India 2Centre for Social and Behavioural Change, Ashoka University, New Delhi, India Abstract In this article, we critique the process of linguistic data enumeration and classification by the Census of India. We map out inclusion and exclusion under Scheduled and non-Scheduled languages and their mother tongues and their representation in state bureaucracies, the judiciary, and education. We highlight that Census classification leads to delegitimization of ‘mother tongues’ that deserve the status of language and official recognition by the state. We argue that the blanket exclusion of languages and mother tongues based on numerical thresholds disregards the languages of about 18.7 million speakers in India. We compute and map the Linguistic Diversity Index of India at the national and state levels and show that the exclusion of mother tongues undermines the linguistic diversity of states. We show that the Hindi belt shows the maximum divergence in Language and Mother Tongue Diversity. We stress the need for India to officially acknowledge the linguistic diversity of states and make the Census classification and enumeration to reflect the true Linguistic diversity. Introduction India and the Indian subcontinent have long been known for their rich diversity in languages and cultures which had baffled travelers, invaders, and colonizers. Amir Khusru, Sufi poet and scholar of the 13th century, wrote about the diversity of languages in Northern India from Sindhi, Punjabi, and Gujarati to Telugu and Bengali (Grierson, 1903-27, vol. -
Map by Steve Huffman; Data from World Language Mapping System
Svalbard Greenland Jan Mayen Norwegian Norwegian Icelandic Iceland Finland Norway Swedish Sweden Swedish Faroese FaroeseFaroese Faroese Faroese Norwegian Russia Swedish Swedish Swedish Estonia Scottish Gaelic Russian Scottish Gaelic Scottish Gaelic Latvia Latvian Scots Denmark Scottish Gaelic Danish Scottish Gaelic Scottish Gaelic Danish Danish Lithuania Lithuanian Standard German Swedish Irish Gaelic Northern Frisian English Danish Isle of Man Northern FrisianNorthern Frisian Irish Gaelic English United Kingdom Kashubian Irish Gaelic English Belarusan Irish Gaelic Belarus Welsh English Western FrisianGronings Ireland DrentsEastern Frisian Dutch Sallands Irish Gaelic VeluwsTwents Poland Polish Irish Gaelic Welsh Achterhoeks Irish Gaelic Zeeuws Dutch Upper Sorbian Russian Zeeuws Netherlands Vlaams Upper Sorbian Vlaams Dutch Germany Standard German Vlaams Limburgish Limburgish PicardBelgium Standard German Standard German WalloonFrench Standard German Picard Picard Polish FrenchLuxembourgeois Russian French Czech Republic Czech Ukrainian Polish French Luxembourgeois Polish Polish Luxembourgeois Polish Ukrainian French Rusyn Ukraine Swiss German Czech Slovakia Slovak Ukrainian Slovak Rusyn Breton Croatian Romanian Carpathian Romani Kazakhstan Balkan Romani Ukrainian Croatian Moldova Standard German Hungary Switzerland Standard German Romanian Austria Greek Swiss GermanWalser CroatianStandard German Mongolia RomanschWalser Standard German Bulgarian Russian France French Slovene Bulgarian Russian French LombardRomansch Ladin Slovene Standard -
Neo-Vernacularization of South Asian Languages
LLanguageanguage EEndangermentndangerment andand PPreservationreservation inin SSouthouth AAsiasia ed. by Hugo C. Cardoso Language Documentation & Conservation Special Publication No. 7 Language Endangerment and Preservation in South Asia ed. by Hugo C. Cardoso Language Documentation & Conservation Special Publication No. 7 PUBLISHED AS A SPECIAL PUBLICATION OF LANGUAGE DOCUMENTATION & CONSERVATION LANGUAGE ENDANGERMENT AND PRESERVATION IN SOUTH ASIA Special Publication No. 7 (January 2014) ed. by Hugo C. Cardoso LANGUAGE DOCUMENTATION & CONSERVATION Department of Linguistics, UHM Moore Hall 569 1890 East-West Road Honolulu, Hawai’i 96822 USA http:/nflrc.hawaii.edu/ldc UNIVERSITY OF HAWAI’I PRESS 2840 Kolowalu Street Honolulu, Hawai’i 96822-1888 USA © All text and images are copyright to the authors, 2014 Licensed under Creative Commons Attribution Non-Commercial No Derivatives License ISBN 978-0-9856211-4-8 http://hdl.handle.net/10125/4607 Contents Contributors iii Foreword 1 Hugo C. Cardoso 1 Death by other means: Neo-vernacularization of South Asian 3 languages E. Annamalai 2 Majority language death 19 Liudmila V. Khokhlova 3 Ahom and Tangsa: Case studies of language maintenance and 46 loss in North East India Stephen Morey 4 Script as a potential demarcator and stabilizer of languages in 78 South Asia Carmen Brandt 5 The lifecycle of Sri Lanka Malay 100 Umberto Ansaldo & Lisa Lim LANGUAGE ENDANGERMENT AND PRESERVATION IN SOUTH ASIA iii CONTRIBUTORS E. ANNAMALAI ([email protected]) is director emeritus of the Central Institute of Indian Languages, Mysore (India). He was chair of Terralingua, a non-profit organization to promote bi-cultural diversity and a panel member of the Endangered Languages Documentation Project, London. -
Languages of New York State Is Designed As a Resource for All Education Professionals, but with Particular Consideration to Those Who Work with Bilingual1 Students
TTHE LLANGUAGES OF NNEW YYORK SSTATE:: A CUNY-NYSIEB GUIDE FOR EDUCATORS LUISANGELYN MOLINA, GRADE 9 ALEXANDER FFUNK This guide was developed by CUNY-NYSIEB, a collaborative project of the Research Institute for the Study of Language in Urban Society (RISLUS) and the Ph.D. Program in Urban Education at the Graduate Center, The City University of New York, and funded by the New York State Education Department. The guide was written under the direction of CUNY-NYSIEB's Project Director, Nelson Flores, and the Principal Investigators of the project: Ricardo Otheguy, Ofelia García and Kate Menken. For more information about CUNY-NYSIEB, visit www.cuny-nysieb.org. Published in 2012 by CUNY-NYSIEB, The Graduate Center, The City University of New York, 365 Fifth Avenue, NY, NY 10016. [email protected]. ABOUT THE AUTHOR Alexander Funk has a Bachelor of Arts in music and English from Yale University, and is a doctoral student in linguistics at the CUNY Graduate Center, where his theoretical research focuses on the semantics and syntax of a phenomenon known as ‘non-intersective modification.’ He has taught for several years in the Department of English at Hunter College and the Department of Linguistics and Communications Disorders at Queens College, and has served on the research staff for the Long-Term English Language Learner Project headed by Kate Menken, as well as on the development team for CUNY’s nascent Institute for Language Education in Transcultural Context. Prior to his graduate studies, Mr. Funk worked for nearly a decade in education: as an ESL instructor and teacher trainer in New York City, and as a gym, math and English teacher in Barcelona. -
UNIT 2 INDIAN LANGUAGES Notes STRUCTURE 2.0 Introduction
Indian Languages UNIT 2 INDIAN LANGUAGES Notes STRUCTURE 2.0 Introduction 2.1 Learning Objectives 2.2 Linguistic Diversity in India 2.2.1 A picture of India’s linguistic diversity 2.2.2 Language Families of India and India as a Linguistic Area 2.3 What does the Indian Constitution say about Languages? 2.4 Categories of Languages in India 2.4.1 Scheduled Languages 2.4.2 Regional Languages and Mother Tongues 2.4.3 Classical Languages 2.4.4 Is there a Difference between Language and Dialect? 2.5 Status of Hindi in India 2.6 Status of English in India 2.7 The Language Education Policy in India Provisions of Various Committees and Commissions Three Language Formula National Curriculum Framework-2005 2.8 Let Us Sum Up 2.9 Suggested Readings and References 2.10 Unit-End Exercises 2.0 INTRODUCTION You must have heard this song: agrezi mein kehte hein- I love you gujrati mein bole- tane prem karu chhuun 18 Diploma in Elementary Education (D.El.Ed) Indian Languages bangali mein kehte he- amii tumaake bhaalo baastiu aur punjabi me kehte he- tere bin mar jaavaan, me tenuu pyar karna, tere jaiyo naiyo Notes labnaa Songs of this kind is only one manifestation of the diversity and fluidity of languages in India. We are sure you can think of many more instances where you notice a multiplicity of languages being used at the same place at the same time. Imagine a wedding in Delhi in a Telugu family where Hindi, Urdu, Dakkhini, Telugu, English and Sanskrit may all be used in the same event. -
Braj Bhasha Mayank Jain1, Yogesh Dawer2, Nandini Chauhan2, Anjali Gupta2 1Jawaharlal Nehru University, 2Dr
Developing Resources for a Less-Resourced Language: Braj Bhasha Mayank Jain1, Yogesh Dawer2, Nandini Chauhan2, Anjali Gupta2 1Jawaharlal Nehru University, 2Dr. Bhim Rao Ambedkar University Delhi, Agra {jnu.mayank, yogeshdawer, nandinipinki850, anjalisoniyagupta89}@gmail.com Abstract This paper gives the overview of the language resources developed for a less-resourced western Indo-Aryan language of India - Braj Bhasha. There is no language resource available for Braj Bhasha. The paper gives the detail of first-ever language resources developed for Braj Bhasha which are text corpus, BIS based POS tagset and annotation, Universal Dependency (UD) based morphological and dependency annotation. UD is a framework for cross-linguistically consistent grammatical annotation and an open community effort with contributors working on over 60 languages. The methodology used to develop corpus, tagset, and annotation can help in creating resources for other less-resourced languages. These resources would provide the opportunity for Braj Bhasha to develop NLP applications and to do research on various areas of linguistics - cognitive linguistics, comparative linguistics, typological and theoretical linguistics. Keywords: Language Resources, Corpus, Tagset, Universal Dependency, Annotation, Braj Bhasha 1. Introduction Hindi texts written in Devanagari. Since the script of Braj Braj or Braj Bhasha1 is a Western Indo-Aryan language Bhasha is Devanagari, the OCR tool gave quite a spoken mainly in the adjoining region spread over Uttar satisfactory result for Braj data. The process of Pradesh and Rajasthan. In present times, when major digitization is a two-step procedure. First, the individual developments in the field of computational linguistics and pages of books and magazines were scanned using a high- natural language processing are playing an integral role in resolution scanner. -
Map by Steve Huffman Data from World Language Mapping System 16
Tajiki Tajiki Tajiki Shughni Southern Pashto Shughni Tajiki Wakhi Wakhi Wakhi Mandarin Chinese Sanglechi-Ishkashimi Sanglechi-Ishkashimi Wakhi Domaaki Sanglechi-Ishkashimi Khowar Khowar Khowar Kati Yidgha Eastern Farsi Munji Kalasha Kati KatiKati Phalura Kalami Indus Kohistani Shina Kati Prasuni Kamviri Dameli Kalami Languages of the Gawar-Bati To rw al i Chilisso Waigali Gawar-Bati Ushojo Kohistani Shina Balti Parachi Ashkun Tregami Gowro Northwest Pashayi Southwest Pashayi Grangali Bateri Ladakhi Northeast Pashayi Southeast Pashayi Shina Purik Shina Brokskat Aimaq Parya Northern Hindko Kashmiri Northern Pashto Purik Hazaragi Ladakhi Indian Subcontinent Changthang Ormuri Gujari Kashmiri Pahari-Potwari Gujari Bhadrawahi Zangskari Southern Hindko Kashmiri Ladakhi Pangwali Churahi Dogri Pattani Gahri Ormuri Chambeali Tinani Bhattiyali Gaddi Kanashi Tinani Southern Pashto Ladakhi Central Pashto Khams Tibetan Kullu Pahari KinnauriBhoti Kinnauri Sunam Majhi Western Panjabi Mandeali Jangshung Tukpa Bilaspuri Chitkuli Kinnauri Mahasu Pahari Eastern Panjabi Panang Jaunsari Western Balochi Southern Pashto Garhwali Khetrani Hazaragi Humla Rawat Central Tibetan Waneci Rawat Brahui Seraiki DarmiyaByangsi ChaudangsiDarmiya Western Balochi Kumaoni Chaudangsi Mugom Dehwari Bagri Nepali Dolpo Haryanvi Jumli Urdu Buksa Lowa Raute Eastern Balochi Tichurong Seke Sholaga Kaike Raji Rana Tharu Sonha Nar Phu ChantyalThakali Seraiki Raji Western Parbate Kham Manangba Tibetan Kathoriya Tharu Tibetan Eastern Parbate Kham Nubri Marwari Ts um Gamale Kham Eastern -
Prabháta Sam'giita
Prabháta Sam'giita Songs of the New Dawn A brief introduction Renaissance Artists & Writers Association A New Dawn • Prabhata Samgiita – music of a new dawn • 5018 songs • From Indian classical to folk music • Written and composed by: – Shri Prabhata Rainjana Sarkar • Between 1982 and 1990 Variety of Themes • Devotional • Mystical love • Seasons • Ecology • Social consciousness • Marching songs • Stages, feelings & experiences in spiritual meditation • Krs'n'a •Shiva Languages Used • Most songs are in Bengali • Over 40 songs are composed in other languages, including: –English –Sanskrit (language for literary & spiritual uses) –Hindi (vocabulary borrows from Sanskrit, non-Persian & non-Arabic) –Urdu (similar to elementary Hindi, vocabulary borrows from Persian & Arabic) –Angika (spoken in in Bihar, Jharkhand & West Bengal) –Maithili (spoken in North-East Bihar & Nepal) Northern-Central India Some Genres of Prabhata Samgiita • Kiirtan songs • Tappa songs • Thumri songs • Gazal songs • Kawali songs • Baul songs • Jhumur songs Kiirtan Songs • Kiirtan is a special type of devotional song style • Centres around singing about the Supreme Entity (God) • Developed towards the end of the 12th century through Jayadeva’s composition of the Gita Govinda (around 1178 AD) • These are Sanskrit love poems saturated with madhura bhava (high state of devotional sentiment and Divine Love) Kiirtan Songs • During the 15th century Dwija Chandidas (1390 -1450) advocated Krishna kiirtans • These are songs in praise of Lord Krishna as the incarnation of the Supreme -
Language and Thought (An Insight Through Hindi)
Language and Thought (An Insight through Hindi) Sumit Verma (Y9605) | Mentor: Dr. Amitabha Mukerjee, Dr. Achla M Raina IIT Kanpur November 15, 2011 Abstract Does the Language that we speak, affect the way we perceive the world? This question has been under debate for a long time now and has seen the work of several cognitive linguists ranging from Benjamin Whorf to Lera Boroditsky. The present work looks at two aspects of Language i.e. word order and grammatical gender and as to how they play/do not play a role in influencing the way we think about the world. The first part of the work focuses on word order (i.e. SVO/SOV distinction) in Language. A non verbal motion experiment was performed on native Hindi speakers (Mess workers form Hall V IITK) which is primarily a SOV language and on native English speakers (using Amazon Mechanical Turk) which is primarily a SVO language. Results showed that irrespective of whether your native language is SVO or SOV in nature, SOV is the preferred order while performing pictorial motion tasks. In the second part of the work, grammatical gender differences between Bihari Hindi and Standard Hindi were looked at. Hindi in general does not have any separate classification for the neutral gender. In Bihari Hindi, most of the neutral words are considered to be masculine whereas in Standard Hindi, there is a specific set of rules to define the gender of a neutral object. A visual experiment was done on native Bihari Hindi Speakers and native Standard Hindi speakers, using words which were masculine in Bihari Hindi but feminine in Standard Hindi.