Wh-Expletives in Hindi-Urdu: the Vp Phase
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
(And Potential) Language and Linguistic Resources on South Asian Languages
CoRSAL Symposium, University of North Texas, November 17, 2017 Existing (and Potential) Language and Linguistic Resources on South Asian Languages Elena Bashir, The University of Chicago Resources or published lists outside of South Asia Digital Dictionaries of South Asia in Digital South Asia Library (dsal), at the University of Chicago. http://dsal.uchicago.edu/dictionaries/ . Some, mostly older, not under copyright dictionaries. No corpora. Digital Media Archive at University of Chicago https://dma.uchicago.edu/about/about-digital-media-archive Hock & Bashir (eds.) 2016 appendix. Lists 9 electronic corpora, 6 of which are on Sanskrit. The 3 non-Sanskrit entries are: (1) the EMILLE corpus, (2) the Nepali national corpus, and (3) the LDC-IL — Linguistic Data Consortium for Indian Languages Focus on Pakistan Urdu Most work has been done on Urdu, prioritized at government institutions like the Center for Language Engineering at the University of Engineering and Technology in Lahore (CLE). Text corpora: http://cle.org.pk/clestore/index.htm (largest is a 1 million word Urdu corpus from the Urdu Digest. Work on Essential Urdu Linguistic Resources: http://www.cle.org.pk/eulr/ Tagset for Urdu corpus: http://cle.org.pk/Publication/papers/2014/The%20CLE%20Urdu%20POS%20Tagset.pdf Urdu OCR: http://cle.org.pk/clestore/urduocr.htm Sindhi Sindhi is the medium of education in some schools in Sindh Has more institutional backing and consequent research than other languages, especially Panjabi. Sindhi-English dictionary developed jointly by Jennifer Cole at the University of Illinois Urbana- Champaign and Sarmad Hussain at CLE (http://182.180.102.251:8081/sed1/homepage.aspx). -
Linguistics Development Team
Development Team Principal Investigator: Prof. Pramod Pandey Centre for Linguistics / SLL&CS Jawaharlal Nehru University, New Delhi Email: [email protected] Paper Coordinator: Prof. K. S. Nagaraja Department of Linguistics, Deccan College Post-Graduate Research Institute, Pune- 411006, [email protected] Content Writer: Prof. K. S. Nagaraja Prof H. S. Ananthanarayana Content Reviewer: Retd Prof, Department of Linguistics Osmania University, Hyderabad 500007 Paper : Historical and Comparative Linguistics Linguistics Module : Indo-Aryan Language Family Description of Module Subject Name Linguistics Paper Name Historical and Comparative Linguistics Module Title Indo-Aryan Language Family Module ID Lings_P7_M1 Quadrant 1 E-Text Paper : Historical and Comparative Linguistics Linguistics Module : Indo-Aryan Language Family INDO-ARYAN LANGUAGE FAMILY The Indo-Aryan migration theory proposes that the Indo-Aryans migrated from the Central Asian steppes into South Asia during the early part of the 2nd millennium BCE, bringing with them the Indo-Aryan languages. Migration by an Indo-European people was first hypothesized in the late 18th century, following the discovery of the Indo-European language family, when similarities between Western and Indian languages had been noted. Given these similarities, a single source or origin was proposed, which was diffused by migrations from some original homeland. This linguistic argument is supported by archaeological and anthropological research. Genetic research reveals that those migrations form part of a complex genetical puzzle on the origin and spread of the various components of the Indian population. Literary research reveals similarities between various, geographically distinct, Indo-Aryan historical cultures. The Indo-Aryan migrations started in approximately 1800 BCE, after the invention of the war chariot, and also brought Indo-Aryan languages into the Levant and possibly Inner Asia. -
Linguistic Survey of India Bihar
LINGUISTIC SURVEY OF INDIA BIHAR 2020 LANGUAGE DIVISION OFFICE OF THE REGISTRAR GENERAL, INDIA i CONTENTS Pages Foreword iii-iv Preface v-vii Acknowledgements viii List of Abbreviations ix-xi List of Phonetic Symbols xii-xiii List of Maps xiv Introduction R. Nakkeerar 1-61 Languages Hindi S.P. Ahirwal 62-143 Maithili S. Boopathy & 144-222 Sibasis Mukherjee Urdu S.S. Bhattacharya 223-292 Mother Tongues Bhojpuri J. Rajathi & 293-407 P. Perumalsamy Kurmali Thar Tapati Ghosh 408-476 Magadhi/ Magahi Balaram Prasad & 477-575 Sibasis Mukherjee Surjapuri S.P. Srivastava & 576-649 P. Perumalsamy Comparative Lexicon of 3 Languages & 650-674 4 Mother Tongues ii FOREWORD Since Linguistic Survey of India was published in 1930, a lot of changes have taken place with respect to the language situation in India. Though individual language wise surveys have been done in large number, however state wise survey of languages of India has not taken place. The main reason is that such a survey project requires large manpower and financial support. Linguistic Survey of India opens up new avenues for language studies and adds successfully to the linguistic profile of the state. In view of its relevance in academic life, the Office of the Registrar General, India, Language Division, has taken up the Linguistic Survey of India as an ongoing project of Government of India. It gives me immense pleasure in presenting LSI- Bihar volume. The present volume devoted to the state of Bihar has the description of three languages namely Hindi, Maithili, Urdu along with four Mother Tongues namely Bhojpuri, Kurmali Thar, Magadhi/ Magahi, Surjapuri. -
Sanskrit Alphabet
Sounds Sanskrit Alphabet with sounds with other letters: eg's: Vowels: a* aa kaa short and long ◌ к I ii ◌ ◌ к kii u uu ◌ ◌ к kuu r also shows as a small backwards hook ri* rri* on top when it preceeds a letter (rpa) and a ◌ ◌ down/left bar when comes after (kra) lri lree ◌ ◌ к klri e ai ◌ ◌ к ke o au* ◌ ◌ к kau am: ah ◌ं ◌ः कः kah Consonants: к ka х kha ga gha na Ê ca cha ja jha* na ta tha Ú da dha na* ta tha Ú da dha na pa pha º ba bha ma Semivowels: ya ra la* va Sibilants: sa ш sa sa ha ksa** (**Compound Consonant. See next page) *Modern/ Hindi Versions a Other ऋ r ॠ rr La, Laa (retro) औ au aum (stylized) ◌ silences the vowel, eg: к kam झ jha Numero: ण na (retro) १ ५ ॰ la 1 2 3 4 5 6 7 8 9 0 @ Davidya.ca Page 1 Sounds Numero: 0 1 2 3 4 5 6 7 8 910 १॰ ॰ १ २ ३ ४ ६ ७ varient: ५ ८ (shoonya eka- dva- tri- catúr- pancha- sás- saptán- astá- návan- dásan- = empty) works like our Arabic numbers @ Davidya.ca Compound Consanants: When 2 or more consonants are together, they blend into a compound letter. The 12 most common: jna/ tra ttagya dya ddhya ksa kta kra hma hna hva examples: for a whole chart, see: http://www.omniglot.com/writing/devanagari_conjuncts.php that page includes a download link but note the site uses the modern form Page 2 Alphabet Devanagari Alphabet : к х Ê Ú Ú º ш @ Davidya.ca Page 3 Pronounce Vowels T pronounce Consonants pronounce Semivowels pronounce 1 a g Another 17 к ka v Kit 42 ya p Yoga 2 aa g fAther 18 х kha v blocKHead -
Part 1: Introduction to The
PREVIEW OF THE IPA HANDBOOK Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet PARTI Introduction to the IPA 1. What is the International Phonetic Alphabet? The aim of the International Phonetic Association is to promote the scientific study of phonetics and the various practical applications of that science. For both these it is necessary to have a consistent way of representing the sounds of language in written form. From its foundation in 1886 the Association has been concerned to develop a system of notation which would be convenient to use, but comprehensive enough to cope with the wide variety of sounds found in the languages of the world; and to encourage the use of thjs notation as widely as possible among those concerned with language. The system is generally known as the International Phonetic Alphabet. Both the Association and its Alphabet are widely referred to by the abbreviation IPA, but here 'IPA' will be used only for the Alphabet. The IPA is based on the Roman alphabet, which has the advantage of being widely familiar, but also includes letters and additional symbols from a variety of other sources. These additions are necessary because the variety of sounds in languages is much greater than the number of letters in the Roman alphabet. The use of sequences of phonetic symbols to represent speech is known as transcription. The IPA can be used for many different purposes. For instance, it can be used as a way to show pronunciation in a dictionary, to record a language in linguistic fieldwork, to form the basis of a writing system for a language, or to annotate acoustic and other displays in the analysis of speech. -
15178-Devanagari-Spacing-Anusvara
Proposal to encode A8FE DEVANAGARI SIGN SPACING ANUSVARA Shriramana Sharma, jamadagni-at-gmail-dot-com, India 2015-Jun-06 This is a proposal to encode one character in the Devanagari Extended block for Samavedic: ◌० A8FE DEVANAGARI SIGN SPACING ANUSVARA This is in contrast to the regular anusvara for this script 0902 ◌ं DEVANAGARI SIGN ANUSVARA as also to the various Vedic anusvara-s seen in Samavedic. On the other hand, this is parallel to the spacing anusvara-s in other Indic scripts which are attested for Vedic (0982 Bengali, 0B02 Oriya, 0C02 Telugu, 0C82 Kannada, 0D02 Malayalam, 11302 Grantha) and glyphically identical to all of them except Bengali. However it should be positioned vertically centered with the Devanagari digits identical to 0966 ० DEVANAGARI DIGIT ZERO. §1. Discussion The regular Devanagari anusvara 0902 ◌ं is non-spacing. This poses a problem when composing texts of the Sama Veda since these use digits on the mainline to denote svara-s: (Below, we refer to the written representation of the linguistic pattern [C*]V as “syllable”.) 1) In the Ṛc-s (verses) a kampa or “aggravated” svarita svara is marked by a 2 above the syllable (or inferred as continued from a previous syllable), a KA or avagraha above the syllable, and a digit 3 following the syllable (see L2/09-372 pp 13 and 14 and L2/15-162 p 4). The digit 3 here denotes the anudātta svara in the latter part of the kampa. 2) In the Sāman-s (melodies), secondary svara-s in which a syllable’s vowel should be continued to be sung are marked by digits following the syllable. -
Hindi and Urdu (HIND URD) 1
Hindi and Urdu (HIND_URD) 1 HINDI AND URDU (HIND_URD) HIND_URD 111-1 Hindi-Urdu I (1 Unit) Beginning college-level sequence to develop basic literacy and oral proficiency in Hindi-Urdu. Devanagari script only. Prerequisite - none. HIND_URD 111-2 Hindi-Urdu I (1 Unit) Beginning college-level sequence to develop basic literacy and oral proficiency in Hindi-Urdu. Devanagari script only. Prerequisite: grade of at least C- in HIND_URD 111-1 or equivalent. HIND_URD 111-3 Hindi-Urdu I (1 Unit) Beginning college-level sequence to develop basic literacy and oral proficiency in Hindi-Urdu. Devanagari script only. Prerequisite: grade of at least C- in HIND_URD 111-2 or equivalent. HIND_URD 116-0 Accelerated Hindi-Urdu Literacy (1 Unit) One-quarter course for speakers of Hindi-Urdu with no literacy skills. Devanagari and Nastaliq scripts; broad overview of Hindi-Urdu grammar. Prerequisite: consent of instructor. HIND_URD 121-1 Hindi-Urdu II (1 Unit) Intermediate-level sequence developing literacy and oral proficiency in Hindi-Urdu. Devanagari and Nastaliq scripts. Prerequisite: grade of at least C- in HIND_URD 111-3 or equivalent. HIND_URD 121-2 Hindi-Urdu II (1 Unit) Intermediate-level sequence developing literacy and oral proficiency in Hindi-Urdu. Devanagari and Nastaliq scripts. Prerequisite: grade of at least C- in HIND_URD 121-1 or equivalent. HIND_URD 121-3 Hindi-Urdu II (1 Unit) Intermediate-level sequence developing literacy and oral proficiency in Hindi-Urdu. Devanagari and Nastaliq scripts. Prerequisite: grade of at least C- in HIND_URD 121-2 or equivalent. HIND_URD 210-0 Hindi-Urdu III: Topics in Intermediate Hindi-Urdu (1 Unit) A series of independent intermediate Hindi-Urdu courses, developing proficiency through readings and discussions. -
English Language in Pakistan: Expansion and Resulting Implications
Journal of Education & Social Sciences Vol. 5(1): 52-67, 2017 DOI: 10.20547/jess0421705104 English Language in Pakistan: Expansion and Resulting Implications Syeda Bushra Zaidi ∗ Sajida Zaki y Abstract: From sociolinguistic or anthropological perspective, Pakistan is classified as a multilingual context with most people speaking one native or regional language alongside Urdu which is the national language. With respect to English, Pakistan is a second language context which implies that the language is institutionalized and enjoying the privileged status of being the official language. This thematic paper is an attempt to review the arrival and augmentation of English language in Pakistan both before and after its creation. The chronological description is carried out in order to identify the consequences of this spread and to link them with possible future developments and implications. Some key themes covered in the paper include a brief historical overview of English language from how it was introduced to the manner in which it has developed over the years especially in relation to various language and educational policies. Based on the critical review of the literature presented in the paper there seems to be no threat to English in Pakistan a prediction true for the language globally as well. The status of English, as it globally elevates, will continue spreading and prospering in Pakistan as well enriching the Pakistani English variety that has already born and is being studied and codified. However, this process may take long due to the instability of governing policies and law-makers. A serious concern for researchers and people in general, however, will continue being the endangered and indigenous varieties that may continue being under considerable pressure from the prestigious varieties. -
On Documenting Low Resourced Indian Languages Insights from Kanauji Speech Corpus
Dialectologia 19 (2017), 67-91. ISSN: 2013-2247 Received 7 December 2015. Accepted 27 April 2016. ON DOCUMENTING LOW RESOURCED INDIAN LANGUAGES INSIGHTS FROM KANAUJI SPEECH CORPUS Pankaj DWIVEDI & Somdev KAR Indian Institute of Technology Ropar*∗ [email protected] / [email protected] Abstract Well-designed and well-developed corpora can considerably be helpful in bridging the gap between theory and practice in language documentation and revitalization process, in building language technology applications, in testing language hypothesis and in numerous other important areas. Developing a corpus for an under-resourced or endangered language encounters several problems and issues. The present study starts with an overview of the role that corpora (speech corpora in particular) can play in language documentation and revitalization process. It then provides a brief account of the situation of endangered languages and corpora development efforts in India. Thereafter, it discusses the various issues involved in the construction of a speech corpus for low resourced languages. Insights are followed from speech database of Kanauji of Kanpur, an endangered variety of Western Hindi, spoken in Uttar Pradesh. Kanauji speech database is being developed at Indian Institute of Technology Ropar, Punjab. Keywords speech corpus, Kanauji, language documentation, endangered language, Western Hindi DOCUMENTACIÓN DE OBSERVACIONES SOBRE LENGUAS HINDIS DE POCOS RECURSOS A PARTIR DE UN CORPUS ORAL DE KANAUJI Resumen Los corpus bien diseñados y bien desarrollados pueden ser considerablemente útiles para salvar la brecha entre la teoría y la práctica en la documentación de la lengua y los procesos de revitalización, en la ∗* Indian Institute Of Technology Ropar (IIT Ropar), Rupnagar 140001, Punjab, India. -
Nepali Letters Ka Kha Pair
Nepali Letters Ka Kha Noam addles tropically. Which Roscoe biases so bilaterally that Malcolm sub her Cheshire? Ventilated Hussein shotguns lachrymosely. Materials on nepali, ka is considered to nepali tool is the more consonants Kannada alphabets and pronounce the initial consonant letters, a unique in syllabic writing the schwa is not a question. Tone letter and nepali ka is an otherwise standard ligature may need to collect and common conjunct consonant letters, or on our prior permission. Write in devanagari is written in preetin font widely used in your can find nepali. Bilingual for the letters ka kha syllables are ready to fit in internet about nepali font, where you can be very popular nepali writing system a script. Stop solution for making this page, malayalam joins letters also used to copy the leading to nepali. Passwords can now here once you should be able to the support materials on daraz nepal you are looking for? Leading to read below or as being common conjunct forms, some writers and easily from left to nepali. It suitable unicode and kha can now here once ruled the previously transliterated text area will be a written. Syllabic writing system; in english to the real sound in the sidebar. Fit in indic scripts, with a search videos in the adjacent characters or tirhuta, you to nepali. Copyright the universal history of what you know how those words into preeti nepali is for? Fuzzy phonetic alphabet, it is through pure ligatures are other? Appended to convert the letters kha is not an historic period or tirhuta, then please check the better you will allow you can find the sounds. -
Urdu to Punjabi Machine Translation: an Incremental Training Approach
(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 7, No. 4, 2016 Urdu to Punjabi Machine Translation: An Incremental Training Approach Umrinderpal Singh Vishal Goyal Gurpreet Singh Lehal Department of Computer Science, Department of Computer Science, Department of Computer Science, Punjabi University Punjabi University Punjabi University Patiala, Punjab, India Patiala, Punjab, India Patiala, Punjab, India Abstract—The statistical machine translation approach is sentence corpus. Collecting parallel phrases were more highly popular in automatic translation research area and convenient as compared to the parallel sentences. promising approach to yield good accuracy. Efforts have been made to develop Urdu to Punjabi statistical machine translation II. URDU AND PUNJABI: A CLOSELY RELATED LANGUAGE system. The system is based on an incremental training approach PAIR to train the statistical model. In place of the parallel sentences Urdu2 is the national language of Pakistan and has official corpus has manually mapped phrases which were used to train the model. In preprocessing phase, various rules were used for language status in few states of India like New Delhi, Uttar tokenization and segmentation processes. Along with these rules, Pradesh, Bihar, Telangana, Jammu and Kashmir where it is text classification system was implemented to classify input text widely spoken and well understood throughout in the other states of India like Punjab, Rajasthan, Maharashtra, to predefined classes and decoder translates given text according 1 to selected domain by the text classifier. The system used Hidden Jharkhand, Madhya Pradesh and many other . The majority Markov Model(HMM) for the learning process and Viterbi of Urdu speakers belong to India and Pakistan, 70 million algorithm has been used for decoding. -
Map by Steve Huffman; Data from World Language Mapping System
Svalbard Greenland Jan Mayen Norwegian Norwegian Icelandic Iceland Finland Norway Swedish Sweden Swedish Faroese FaroeseFaroese Faroese Faroese Norwegian Russia Swedish Swedish Swedish Estonia Scottish Gaelic Russian Scottish Gaelic Scottish Gaelic Latvia Latvian Scots Denmark Scottish Gaelic Danish Scottish Gaelic Scottish Gaelic Danish Danish Lithuania Lithuanian Standard German Swedish Irish Gaelic Northern Frisian English Danish Isle of Man Northern FrisianNorthern Frisian Irish Gaelic English United Kingdom Kashubian Irish Gaelic English Belarusan Irish Gaelic Belarus Welsh English Western FrisianGronings Ireland DrentsEastern Frisian Dutch Sallands Irish Gaelic VeluwsTwents Poland Polish Irish Gaelic Welsh Achterhoeks Irish Gaelic Zeeuws Dutch Upper Sorbian Russian Zeeuws Netherlands Vlaams Upper Sorbian Vlaams Dutch Germany Standard German Vlaams Limburgish Limburgish PicardBelgium Standard German Standard German WalloonFrench Standard German Picard Picard Polish FrenchLuxembourgeois Russian French Czech Republic Czech Ukrainian Polish French Luxembourgeois Polish Polish Luxembourgeois Polish Ukrainian French Rusyn Ukraine Swiss German Czech Slovakia Slovak Ukrainian Slovak Rusyn Breton Croatian Romanian Carpathian Romani Kazakhstan Balkan Romani Ukrainian Croatian Moldova Standard German Hungary Switzerland Standard German Romanian Austria Greek Swiss GermanWalser CroatianStandard German Mongolia RomanschWalser Standard German Bulgarian Russian France French Slovene Bulgarian Russian French LombardRomansch Ladin Slovene Standard