A Neural Approach to Indo-Aryan Historical Phonology and Subgrouping

Total Page:16

File Type:pdf, Size:1020Kb

A Neural Approach to Indo-Aryan Historical Phonology and Subgrouping Disentangling dialects: a neural approach to Indo-Aryan historical phonology and subgrouping Chundra A. Cathcart1,2 and Taraka Rama3 1Department of Comparative Language Science, University of Zurich 2Center for the Interdisciplinary Study of Language Evolution, University of Zurich 3Department of Linguistics, University of North Texas [email protected], [email protected] Abstract seeks to move further towards closing this gap. We use an LSTM-based encoder-decoder architecture This paper seeks to uncover patterns of sound to analyze a large data set of OIA etyma (ancestral change across Indo-Aryan languages using an LSTM encoder-decoder architecture. We aug- forms) and medieval/modern Indo-Aryan reflexes ment our models with embeddings represent- (descendant forms) extracted from a digitized et- ing language ID, part of speech, and other fea- ymological dictionary, with the goal of inferring tures such as word embeddings. We find that a patterns of sound change from input/output string highly augmented model shows highest accu- pairs. We use language embeddings with the goal racy in predicting held-out forms, and inves- of capturing individual languages’ historical phono- tigate other properties of interest learned by logical behavior. We augment this basic model with our models’ representations. We outline exten- additional embeddings that may help in capturing sions to this architecture that can better capture variation in Indo-Aryan sound change. irregular patterns of sound change not captured by language embeddings; additionally, we compare 1 Introduction the performance of these models against a baseline model that is embedding-free. The Indo-Aryan languages, comprising Sanskrit We evaluate the performance of models with dif- (otherwise known as Old Indo-Aryan, or OIA) ferent embeddings by assessing the accuracy with and its descendant languages, including medieval which held-out forms in medieval/modern Indo- languages like Pa¯l.i and modern languages such Aryan languages are predicted on the basis of the as Hindi/Urdu, Panjabi, and Bangla, form a well- OIA etyma from which they descend, and carry out studied subgroup of the Indo-European language a linguistically informed error analysis. We provide family. At the same time, many aspects of the a quantitative evaluation of the degree of agree- Indo-Aryan languages’ history remain poorly un- ment between the genetic signal of each model’s derstood. One reason is that there are large histori- embeddings and a reference taxonomy of the Indo- cal gaps in the attestation of Indo-Aryan languages, Aryan languages. We find that a model with embed- making it challenging to document when certain dings representing data points’ language ID, part shared innovations took place. Additionally, while of speech, semantic profile and etymon ID predicts the operation of sound changes are a diagnostic held-out forms that are closest to the ground truth for subgrouping that historical linguistic often em- forms, but that a tree constructed from language em- ploy, Indo-Aryan languages have remained in close beddings learned by this model shows lower agree- contact for millennia, borrowing words from each ment with a reference taxonomy of Indo-Aryan other and making it difficult to establish subgroup- than a tree constructed on the basis of a model with defining sound laws using the traditional compara- only language embeddings, and that in general, the tive method of historical linguistics. ability of our models to recapitulate uncontrover- While a number of large digitized multilingual sial genetic signal is mixed. Finally, we carry out resources pertaining to the Indo-Aryan languages experiments designed to investigate the informa- exist, these data sets have not been widely used tion captured by specific embeddings used in our in studies, and our understanding of Indo-Aryan models; we find that our models learn meaningful dialectology stands to benefit greatly from the ap- information from augmented representations, and plication of deep learning techniques. This paper outline directions for future research. 620 Proceedings of the 24th Conference on Computational Natural Language Learning, pages 620–630 Online, November 19-20, 2020. c 2020 Association for Computational Linguistics https://doi.org/10.18653/v1/P17 2 Background: Indo-Aryan dialectology from large parallel corpora using different neu- ral architectures (Ostling¨ and Tiedemann, 2017; Despite a long history of scholarship, there is no Johnson et al., 2017; Tiedemann, 2018; Rabinovich general consensus regarding the subgrouping of et al., 2017). These embeddings tend to produce Indo-Aryan languages comparable to that regarding hierarchical clustering configurations that are close other branches of Indo-European, such as Slavic to the language classification trees inferred from or Germanic. Scholars argue for a core-periphery historical linguistic research. These claims have (Hoernle, 1880; Grierson, 1967 [1903-28]; South- been tested by Bjerva et al.(2019) who find that worth, 2005; Zoller, 2016) or East-West split be- the distances between learned language represen- tween the languages (Montaut, 2009, 2017), or tations may not be reflective of genetic relation- are agnostic to the higher-order subgrouping of ship but of structural similarity. It is not always Indo-Aryan, given the many challenges involved in straightforward to interpret the sources of differ- establishing such groups (for discussion, see South- entiation among these embeddings; typically, em- worth 1964; Jeffers 1976; Masica 1991; Toulmin beddings based on synchronic patterns of language 2009). Disagreement between these groups stems use in corpora may be due to word order patterns, largely from the fact that the different hypotheses phonotactic patterns, or a number of other inter- are based on different linguistic features, and there related language-specific distributions. Cathcart is no agreed upon way in which to establish that and Wandl(2020) investigate the patterns of sound individual features shared across languages are in- change captured by a neural encoder-decoder archi- herited from a common ancestor rather than due tecture trained on Proto-Slavic and contemporary to parallel innovation. The traditional compara- Slavic word forms, and find that embeddings dis- tive method of historical linguistics (Hoenigswald, pay at least partial genetic signal, but also note a 1960; Weiss, 2015) tends to establish linguistic sub- negative relationship between overall model accu- groups on the basis of innovations in morphology racy and the degree to which embeddings reflect as well as shared sound changes, some of which are the communis opinio subgrouping of Slavic. thought to be unlikely to operate independently. In- deed, many scholars have agreed that Indo-Aryan 4 Data and rationale for model design subgrouping should be established according to sound change; however, the establishment of reg- We use data from an etymological dictionary of the 1 ular sound changes has proved challenging given Indo-Aryan languages (Turner, 1962–1966). We the high degree of irregularity in the data (Masica, extract OIA etyma and their corresponding reflexes 1991). Our method has the potential to detect regu- in medieval and modern Indo-Aryan languages larities and bear on the questions described above. (e.g., OIA vakya¯ ‘speech, words’ develops to Pa¯l.i vakya¯ , Kashmiri wakh¯ , etc.). As the traditional 3 Related work Indological orthography used to transcribe forms in the dictionary is phonemic, we retain this repre- Traditional computational dialectology (Kessler, sentation and convert characters with diacritics to 1995; Nerbonne and Heeringa, 2001) identifies di- a Normalization Form Canonical Decomposition alect clusters using edit distance; more recent work (NFD) Unicode representation in order to reduce uses neural architectures for dialect classification the number of input and output character types. Ad- based on social media data for languages such as ditionally, we extract glosses provided for OIA et- English (Rahimi et al., 2017b,a) and German (Hovy yma (at the time of writing, the extraction of reflex and Purschke, 2018). Computational methods have glosses cannot be straightforwardly automated due been applied to the related field of historical lin- to the unstructured nature of the markup language, guistics to identify cognates (words that go back plus the absence of glosses for certain reflexes). We to a common ancestor) and infer relationships be- match languages in the dictionary with the closest tween languages (Rama et al., 2018) as well as the matching glottocode from the Glottolog database reconstruction of ancestral words through Bayesian (Hammarstrom¨ et al., 2017), and omit languages methods (Bouchard-Cotˆ e´ et al., 2013), gated neu- with fewer than 100 entries. This results in a data ral networks (Meloni et al., 2019) and non-neural set of 82431 forms in 61 languages; the number of sequence labeling methods (Ciobanu and Dinu, 2020). 1Online at https://dsalsrv04.uchicago.edu/ Other recent work infers language embeddings dictionaries/soas/ 621 forms in each language can be seen in Table1. The accounting for the factors described above can im- most frequent language is the medieval language prove model accuracy and allow us to tease apart Prakrit, followed by Hindi, the medieval language legitimate patterns of sound change from orthogo- Pal¯.i, Marathi, and Panjabi. nal factors. As mentioned above, a goal of this study is to In order to achieve this goal,
Recommended publications
  • Some Principles of the Use of Macro-Areas Language Dynamics &A
    Online Appendix for Harald Hammarstr¨om& Mark Donohue (2014) Some Principles of the Use of Macro-Areas Language Dynamics & Change Harald Hammarstr¨om& Mark Donohue The following document lists the languages of the world and their as- signment to the macro-areas described in the main body of the paper as well as the WALS macro-area for languages featured in the WALS 2005 edi- tion. 7160 languages are included, which represent all languages for which we had coordinates available1. Every language is given with its ISO-639-3 code (if it has one) for proper identification. The mapping between WALS languages and ISO-codes was done by using the mapping downloadable from the 2011 online WALS edition2 (because a number of errors in the mapping were corrected for the 2011 edition). 38 WALS languages are not given an ISO-code in the 2011 mapping, 36 of these have been assigned their appropri- ate iso-code based on the sources the WALS lists for the respective language. This was not possible for Tasmanian (WALS-code: tsm) because the WALS mixes data from very different Tasmanian languages and for Kualan (WALS- code: kua) because no source is given. 17 WALS-languages were assigned ISO-codes which have subsequently been retired { these have been assigned their appropriate updated ISO-code. In many cases, a WALS-language is mapped to several ISO-codes. As this has no bearing for the assignment to macro-areas, multiple mappings have been retained. 1There are another couple of hundred languages which are attested but for which our database currently lacks coordinates.
    [Show full text]
  • Kupha, Parmas, Thamoh and Malet , Village Survey Of, Part-VI-No-6, Vol
    C ENS US 0 FIN D I A I 96J VOLUME XX-PART VI-NO, HIMACHAL PRADESH AND MALET The Superintendent of Census Operations Himachal Pradesh :rict) Ileld lrrvestlgatlon by Draft by SURENDER MOHAN BHATNAGER SURENDER MOHAN BHATNAGER and TARLOK CHAND SUD £ditor RAM CHANDRA PAL SINGH of the Indian AJmltristrat/ye Sen'jce Superintendent of Census Operations, Himachal Pradesh ..... .... •,•• !lilt-•• .... ~r....... ... .....__ ..J .~ o ..,... § z ,nut- <iJD1I- ,11111111- "unll- 1D1lt- "..I/Id)- If!llI1iJ- "ilt- ,_ 'tRlIll- a. 'IIHi- 01( "'II. ~nllf- 411k1- ::i .". ,,)Iltll- '111111- 'NlnU1- .,,",w- ..J ./IIIrt- 01( ." "41f1J1r- Z •,I!il1!- . 0 i= 0 z «cJ) a:~ 4li~ ..... ~ = 'CIf~ '1IIf- ._ -- .... .~ .. ". ....... oOVf~ II.... ... •• "I!/IJ- ........ ... -- ~ ~ .... -- .... l'V 41. lfJ ~ ~.".' ,__ "- .__ .'q.,.. ~ ... -- 4- ~". rfJ ... .... ~ ~ .... ~ __ . ... ... '. ~ .. '''''1- Q, -< :l _J -< z .qUI- 0_. ffi ... .~ ..... -- 5 ... J: z •• 111/111- ,"_ 0 ....... i;: .~ .__ ~ o. e- :t < .' - ''111- !t~ ~ J: ."" ~"'C ....... .. ::::~O 1-/4- -jJl.-"';-. - ..... ~~~~ '11'" -- , 4, t .'_ f ! I f f I " / f t ( t , ~ I if! f .( , ; f t ' i f I I , D Contents Page Foreword IX Preface XII Acknowledgements XIV 1 The V il/age 1 Journey to Kilar-Origin of the inhabitants-Legend about the villages-Physical aspects-Geology, rock and soil-Climate­ Water sources-Flora and fauna-Cremation ground-Public places-Welfare Institutions-Important villages and places of interest. 2 The People 10 .. Population-Residential pattern-House-ty pes-House construc­ tion-Fuel and lighting--Dress-Ornaments-Family. Structure­ Food and Drinks-Utensils. 3 Birth, Marriage & Death Customs 24 Birth-A case study-Marriage-Death-Statistics relating to birth, marriage and death.
    [Show full text]
  • 2001 Presented Below Is an Alphabetical Abstract of Languages A
    Hindi Version Home | Login | Tender | Sitemap | Contact Us Search this Quick ABOUT US Site Links Hindi Version Home | Login | Tender | Sitemap | Contact Us Search this Quick ABOUT US Site Links Census 2001 STATEMENT 1 ABSTRACT OF SPEAKERS' STRENGTH OF LANGUAGES AND MOTHER TONGUES - 2001 Presented below is an alphabetical abstract of languages and the mother tongues with speakers' strength of 10,000 and above at the all India level, grouped under each language. There are a total of 122 languages and 234 mother tongues. The 22 languages PART A - Languages specified in the Eighth Schedule (Scheduled Languages) Name of language and Number of persons who returned the Name of language and Number of persons who returned the mother tongue(s) language (and the mother tongues mother tongue(s) language (and the mother tongues grouped under each grouped under each) as their mother grouped under each grouped under each) as their mother language tongue language tongue 1 2 1 2 1 ASSAMESE 13,168,484 13 Dhundhari 1,871,130 1 Assamese 12,778,735 14 Garhwali 2,267,314 Others 389,749 15 Gojri 762,332 16 Harauti 2,462,867 2 BENGALI 83,369,769 17 Haryanvi 7,997,192 1 Bengali 82,462,437 18 Hindi 257,919,635 2 Chakma 176,458 19 Jaunsari 114,733 3 Haijong/Hajong 63,188 20 Kangri 1,122,843 4 Rajbangsi 82,570 21 Khairari 11,937 Others 585,116 22 Khari Boli 47,730 23 Khortha/ Khotta 4,725,927 3 BODO 1,350,478 24 Kulvi 170,770 1 Bodo/Boro 1,330,775 25 Kumauni 2,003,783 Others 19,703 26 Kurmali Thar 425,920 27 Labani 22,162 4 DOGRI 2,282,589 28 Lamani/ Lambadi 2,707,562
    [Show full text]
  • Multilingual Practices in Kullu (Himachal Pradesh, India)
    Multilingual practices in Kullu (Himachal Pradesh, India) Julia V. Mazurova, the Institute of Linguistics, Russian Academy of Sciences Project participants Himachali Pahari Grammar description and lexicon of Kullui Fieldwork research Kullui – an Indo-Aryan language of the Himachali Pahari (also known as Western Pahari) • Expedition 2014 Fund of Fundamental Linguistic Research, project 2014 “Documentation of Kullui (Western Pahari)”, supervisor Julia Mazurova • Expedition 2016 Russian State Fund for Scientific Research № 16-34-01040 «Grammar description and lexicon of Kullui», supervisor Elena Knyazeva Goals of the research Linguistic goals • Documentation of Kullui on the modern linguistic and technical level: dictionary, corpus of morphologically glossed texts with audio and video recordings. • Theoretical research of the Kullui phonology and grammar • Fieldwork research of the Himachali dialectal continuum • Description of the areal and typological features of the Himachali dialectal continuum Goals of the research Socio-linguistic goals • Linguistic situation in the region. Functional domains of the languages • Geographical location of the Kullui language • Differences between Kullui and neighbor dialects • Choosing informants • Evaluating of the language knowledge of the speakers • Language vitality • Variation in Kullui depending on age, gender, social level, education and other factors Linguistic situation in India ➢ Official languages of the Union Government of India – Hindi and English ➢ Scheduled languages (in States of India)
    [Show full text]
  • Mapping India's Language and Mother Tongue Diversity and Its
    Mapping India’s Language and Mother Tongue Diversity and its Exclusion in the Indian Census Dr. Shivakumar Jolad1 and Aayush Agarwal2 1FLAME University, Lavale, Pune, India 2Centre for Social and Behavioural Change, Ashoka University, New Delhi, India Abstract In this article, we critique the process of linguistic data enumeration and classification by the Census of India. We map out inclusion and exclusion under Scheduled and non-Scheduled languages and their mother tongues and their representation in state bureaucracies, the judiciary, and education. We highlight that Census classification leads to delegitimization of ‘mother tongues’ that deserve the status of language and official recognition by the state. We argue that the blanket exclusion of languages and mother tongues based on numerical thresholds disregards the languages of about 18.7 million speakers in India. We compute and map the Linguistic Diversity Index of India at the national and state levels and show that the exclusion of mother tongues undermines the linguistic diversity of states. We show that the Hindi belt shows the maximum divergence in Language and Mother Tongue Diversity. We stress the need for India to officially acknowledge the linguistic diversity of states and make the Census classification and enumeration to reflect the true Linguistic diversity. Introduction India and the Indian subcontinent have long been known for their rich diversity in languages and cultures which had baffled travelers, invaders, and colonizers. Amir Khusru, Sufi poet and scholar of the 13th century, wrote about the diversity of languages in Northern India from Sindhi, Punjabi, and Gujarati to Telugu and Bengali (Grierson, 1903-27, vol.
    [Show full text]
  • Map by Steve Huffman; Data from World Language Mapping System
    Svalbard Greenland Jan Mayen Norwegian Norwegian Icelandic Iceland Finland Norway Swedish Sweden Swedish Faroese FaroeseFaroese Faroese Faroese Norwegian Russia Swedish Swedish Swedish Estonia Scottish Gaelic Russian Scottish Gaelic Scottish Gaelic Latvia Latvian Scots Denmark Scottish Gaelic Danish Scottish Gaelic Scottish Gaelic Danish Danish Lithuania Lithuanian Standard German Swedish Irish Gaelic Northern Frisian English Danish Isle of Man Northern FrisianNorthern Frisian Irish Gaelic English United Kingdom Kashubian Irish Gaelic English Belarusan Irish Gaelic Belarus Welsh English Western FrisianGronings Ireland DrentsEastern Frisian Dutch Sallands Irish Gaelic VeluwsTwents Poland Polish Irish Gaelic Welsh Achterhoeks Irish Gaelic Zeeuws Dutch Upper Sorbian Russian Zeeuws Netherlands Vlaams Upper Sorbian Vlaams Dutch Germany Standard German Vlaams Limburgish Limburgish PicardBelgium Standard German Standard German WalloonFrench Standard German Picard Picard Polish FrenchLuxembourgeois Russian French Czech Republic Czech Ukrainian Polish French Luxembourgeois Polish Polish Luxembourgeois Polish Ukrainian French Rusyn Ukraine Swiss German Czech Slovakia Slovak Ukrainian Slovak Rusyn Breton Croatian Romanian Carpathian Romani Kazakhstan Balkan Romani Ukrainian Croatian Moldova Standard German Hungary Switzerland Standard German Romanian Austria Greek Swiss GermanWalser CroatianStandard German Mongolia RomanschWalser Standard German Bulgarian Russian France French Slovene Bulgarian Russian French LombardRomansch Ladin Slovene Standard
    [Show full text]
  • BHADARWAHI:AT YPOLOGICAL SKETCH Amitabh Vikram DWIVEDI
    BHADARWAHI: A TYPOLOGICAL SKETCH Amitabh Vikram DWIVEDI Shri Mata Vaishno Devi University, India [email protected] Abstract This paper is a summary of some phonological and morphosyntactice features of the Bhadarwahi language of Indo-Aryan family. Bhadarwahi is a lesser known and less documented language spoken in district of Doda of Jammu region of Jammu and Kashmir State in India. Typologically it is a subject dominant language with an SOV word order (SV if without object) and its verb agrees with a noun phrase which is not followed by an overt post-position. These noun phrases can move freely in the sentence without changing the meaning of the sentence. The indirect object generally precedes the direct object. Aspiration, like any other Indo-Aryan languages, is a prominent feature of Bhadarwahi. Nasalization is a distinctive feature, and vowel and consonant contrasts are commonly observed. Infinitive and participle forms are formed by suffixation while infixation is also found in causative formation. Tense is carried by auxiliary and aspect and mood is marked by the main verb. Keywords: Indo-Aryan; less documented; SOV; aspiration; infixation Povzetek Članek je nekakšen daljši povzetek fonoloških in morfosintaktičnih značilnosti jezika badarvahi, enega izmed članov indo-arijske jezikovne družine. Badarvahi je manj poznan in slabo dokumentiran jezik z območja Doda v regiji Jammu v Kašmirju. Tipološko je zanj značilen dominanten osebek in besedni red: osebek, predmet, povedek. Glagoli se povečini ujemajo s samostalniškimi frazami, ki lahko v stavku zavzemajo katerikoli položaj ne da bi spremenile pomen stavka. Nadaljna značilnost jezika badarvahi je tudi to, da indirektni predmeti ponavadi stojijo pred direktnimi predmeti.
    [Show full text]
  • Ethno-Linguistic Diversity and Urban Agglomeration LSE Research Online URL for This Paper: Version: Accepted Version
    Ethno-linguistic diversity and urban agglomeration LSE Research Online URL for this paper: http://eprints.lse.ac.uk/104513/ Version: Accepted Version Article: Eberle, Ulrich, Henderson, J. Vernon, Rohner, Dominic and Schmidheiny, Kurt (2020) Ethno-linguistic diversity and urban agglomeration. Proceedings of the National Academy of Sciences of the United States of America. ISSN 1091-6490 (In Press) Reuse Items deposited in LSE Research Online are protected by copyright, with all rights reserved unless indicated otherwise. They may be downloaded and/or printed for private study, or other acts as permitted by national copyright laws. The publisher or other rights holders may allow further reproduction and re-use of the full text version. This is indicated by the licence information on the LSE Research Online record for the item. [email protected] https://eprints.lse.ac.uk/ Ethno-Linguistic Diversity and Urban Agglomeration Ulrich J. Eberlea,b,1,2, J. Vernon Hendersona,1,2, Dominic Rohnerb,1,2, and Kurt Schmidheinyc,1,2 aLondon School of Economics, Centre for Economic Performance, Houghton Street, London WC2A2AE, UK.; bUniversity of Lausanne, Department of Economics, Internef, 1015 Lausanne, Switzerland.; cUniversity of Basel, Faculty of Business and Economics, Peter Merian-Weg 6, 4002 Basel, Switzerland. This manuscript was compiled on May 14, 2020 1 This article shows that higher ethno-linguistic diversity is associated are also in the top 3% of degree of diversity by provinces 36 2 with a greater risk of social tensions and conflict, which in turn is a worldwide and Nagaland is at the center of India’s well known 37 3 dispersion force lowering urbanization and the incentives to move to on-going conflict in its Northeast.
    [Show full text]
  • Map by Steve Huffman Data from World Language Mapping System 16
    Tajiki Tajiki Tajiki Shughni Southern Pashto Shughni Tajiki Wakhi Wakhi Wakhi Mandarin Chinese Sanglechi-Ishkashimi Sanglechi-Ishkashimi Wakhi Domaaki Sanglechi-Ishkashimi Khowar Khowar Khowar Kati Yidgha Eastern Farsi Munji Kalasha Kati KatiKati Phalura Kalami Indus Kohistani Shina Kati Prasuni Kamviri Dameli Kalami Languages of the Gawar-Bati To rw al i Chilisso Waigali Gawar-Bati Ushojo Kohistani Shina Balti Parachi Ashkun Tregami Gowro Northwest Pashayi Southwest Pashayi Grangali Bateri Ladakhi Northeast Pashayi Southeast Pashayi Shina Purik Shina Brokskat Aimaq Parya Northern Hindko Kashmiri Northern Pashto Purik Hazaragi Ladakhi Indian Subcontinent Changthang Ormuri Gujari Kashmiri Pahari-Potwari Gujari Bhadrawahi Zangskari Southern Hindko Kashmiri Ladakhi Pangwali Churahi Dogri Pattani Gahri Ormuri Chambeali Tinani Bhattiyali Gaddi Kanashi Tinani Southern Pashto Ladakhi Central Pashto Khams Tibetan Kullu Pahari KinnauriBhoti Kinnauri Sunam Majhi Western Panjabi Mandeali Jangshung Tukpa Bilaspuri Chitkuli Kinnauri Mahasu Pahari Eastern Panjabi Panang Jaunsari Western Balochi Southern Pashto Garhwali Khetrani Hazaragi Humla Rawat Central Tibetan Waneci Rawat Brahui Seraiki DarmiyaByangsi ChaudangsiDarmiya Western Balochi Kumaoni Chaudangsi Mugom Dehwari Bagri Nepali Dolpo Haryanvi Jumli Urdu Buksa Lowa Raute Eastern Balochi Tichurong Seke Sholaga Kaike Raji Rana Tharu Sonha Nar Phu ChantyalThakali Seraiki Raji Western Parbate Kham Manangba Tibetan Kathoriya Tharu Tibetan Eastern Parbate Kham Nubri Marwari Ts um Gamale Kham Eastern
    [Show full text]
  • Proposal for Characters for Khowar, Torwali, and Burushaski 1
    Proposal for characters for Khowar, Torwali, and Burushaski 1 ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS 1 FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646TP PT Please fill all the sections A, B and C below. Please read Principles and Procedures Document (P & P) from http://www.dkuug.dk/JTC1/SC2/WG2/docs/principles.htmlHTU UTH for guidelines and details before filling this form. Please ensure you are using the latest Form from http://www.dkuug.dk/JTC1/SC2/WG2/docs/summaryform.htmlHTU .UTH See also http://www.dkuug.dk/JTC1/SC2/WG2/docs/roadmaps.htmlHTU UTH for latest Roadmaps. A. Administrative 1. Title: Proposal to add characters needed for Khowar, Torwali, and Burushaski 2. Requester's name: (1) Elena Bashir, Ph.D.; (2) Sarmad Hussain, Ph.D.; (3) Deborah Anderson, Script Encoding Initiative, UC Berkeley 3. Requester type (Member body/Liaison/Individual contribution): (1) Individual researcher at University of Chicago; (2) Head, Center for Research in Urdu Language Processing, FAST National University of Computer and Emerging Science, Lahore,with proposal supported by National Bodies and local user-community organizations; (3) Liaison member SC2 and WG2 4. Submission date: May 2006 5. Requester's reference (if applicable): Dept. of South Asian Languages and Civilizations, University of Chicago (USA) 6. Choose one of the following: This is a complete proposal: Complete proposal (or) More information will be provided later: B. Technical – General 1. Choose one of the following: a. This proposal is for a new script (set of characters): Proposed name of script: b.
    [Show full text]
  • Cultural & Migration Tables, Part II-C, Volume-XX, Himachal Pradesh
    CENSUS OF INDIA 1961 VOLUME XX HI,MACHAL PRADESH PART II-C CULTURAL & MIGRATION TABLES RAM CHANDRA PAL SINGH of tbe Indian Administrative Service Superintendent of Census Operations~ HitIl~ch~l Pfadesh oENS U S 0 F IN D I A 1 9 6 I-P U B L lOA T ION 8 Central Government Publications 1961 Oensu$ Report, Volume XX-Himachal Pradesh, will be in tM following Parts- I-A General Repor,t I-B Report on Vital Statistics and Fertility Survey. 1-0 Subsidiary Tables II-A General Population Tables and Primary Census Abstracts II-B Economic Tables II·O Cultural !lnd JIigration Ta.ble~ (The present part) III HOlWehold Economic Tables IV ReiIOI, and Tab'e 1 on Housing and Establishments Y·}' SpDi11 Tables on S hduled Ca~tes and Scheduled Tribes (including reprint~) Y·B(I) . EchuoJraphio note~ on Scheduled Castes and Scheduled Tribes Y-B(U) A' tudy of G,1.ddi ·-A Scheduled Tribe-and affiliated castes by Prof, JYilliam H. Newell n Villu,;o t:\U\q Monographs (36 villages) VII·A . SilI'vey of Selected Handicrafts YII-B . Fairs and Festivals YIII·A Admini~tmtive &.port on Enumeration (For official use only) VIII-B Administrative Report on Tabulation (For official use only) IX Atla) of Himacha.l Prade1h HIMAOHAL PRADESH GOVERNMENT PUBLICATIONS District Handbook-Chamba District Handbook-Mandi District Handbook-Bilaspur District Handbook-Mahasu District Handbook-Sirmur District Handbook-:-Kinnaur PAGES Preface v I~TRODUCTION C SERIES-CULTURAL TABLES VII T.\BLE C-I Composition of Sample Households by Relationship to Head of Family Classified by Size of Land Cultivated .
    [Show full text]
  • Brief Notes on Mother Tongues , Punjab
    CIENSlJJS Of DNDIA 1971 PUNJAB BRIEF NOTES ON MOTHER TONGUES (Based on 1961 Returns) By 315.455 R. C. NIGAM 1971 ~NT REGISTRAR GENERAL, INDIA IF THE REGISTRAR GENERAL, INDIA Mot Ton (LANGUAGE DIVISION) LIST OF MOTHER-TONGUES (1961) PUNJAB SI. Name of Mother­ Name in Local Comments, if any No. tongues with Script variant spellings in brackets 1 2 3 4 Adivasi ( Adibasi, Could be helpful if ~J(.ci­ Adiwasi) fic tribal/community name could be linked up with mother-tongue name. '2 AfghanijKabuli/ ~GorTol t~T~:81 jU'tl"3.' Pakhto/Pashto/ u;:;i'a!uo Tol Pathani 3 African Return is after the name of the continent. Spe­ cificmother-tongue names will have to be as­ certained. 4 Aia Alam Unclassified in 1961 Census. Will need further scrutiny. 5 Almori 6 Anal "POC3 7 Arabic/Arbi »{CI~l In 1961 Census some Urdu speakers also had returned Arabic/ Arbi as their mother- tongue. S Assamese ( Assami) WF[ll-il 9 Awadhi { Avadi) »{~l 10 Baghelkhandi ~ti18ci~1 (Bhugelkhud) 2 SI. Name of Mother­ Name in Local Comments, if any No. tongues with Script variant spellings in brackets 1 2 3 4 11 Bagri ( Bagari, Bagria, Bahgri ) 12 Bagri-Rajasthani ~TaJ~l-'aTtlRe: T'()l 13 Bahawalpuri ~T~'Sy''al 14 Baliai ( BaHam) e 81",,1 E1 15 Balochi/Baluchi ri~l 16 Bangaru ( Bangru, ~taJil. If returned again in Banger, Bangri) 1971, then location of speakers at the village level need be specified. 17 Baori (Bawria, ~l){a1 The name of at least Bawaria, Boari, one village of their Boria) concentration from the state will be required to be noted.
    [Show full text]