Rule Based Parts of Speech Tagger for Chhattisgarhi Language

Total Page:16

File Type:pdf, Size:1020Kb

Rule Based Parts of Speech Tagger for Chhattisgarhi Language International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-7 Issue-4, November 2018 Rule Based Parts of Speech Tagger for Chhattisgarhi Language Vikas Pandey, M.V Padmavati, Ramesh Kumar Example 1: हमन दनु ो बैऱगाड़ी म रायपुर जाबⴂ Abstract: There is an increasing demand for machine translation systems for various regional languages of India. WORDS हमन दनु ो बैऱगाड़ी म रायपुर जाबⴂ Chhattisgarhi being the language of the young Chhattisgarh state requires automatic languages translating system. Various types of TAGS PRP N N PP N VM natural language processing (NLP) tools are required for Table (a): Chhattisgarhi words and its tags taken from developing Chhattisgarhi to Hindi machine translation (MT) Chhattisgarhi tag set. system. In this paper, we are presenting rule based parts of speech tagger for Chhattisgarhi language. Parts of Speech tagging is a There are various approaches for POS tagging: Rule based procedure in which each word of sentence is assigned a tag from approach, Statistical approach and Hybrid approach [2, 3]. tag set. The Parts of Speech tagger is based on rule base which is Accuracy factor is the most important factor in deciding the formed by taken into consideration the grammatical structure of performance of POS tagger [2]. Chhattisgarhi language. The system is constructed over corpus The Rule Based POS tagging approach is based on size of 40,000 words with tag set consists of 30 different parts of speech tags. The corpus is taken from various Chhattisgarhi grammar rules that are framed by observing the grammatical stories. The system achieves an accuracy of 78%. structure of any language. These rules can be written in form Index Terms: Chhattisgarhi, Machine Translation, Natural of production grammar rules. Example: Language Processing, Parts of Speech tagger, Rule Based System. “A proper noun is always followed by a noun” as in the Table (a) हमन (Pronoun) is followed by दनु ो (Noun) I. INTRODUCTION There are some limitations of rule based approach; the Most of the regional languages are low resources language. main limitation is the formation of rule base. In this a rule is Some Indian languages are called low resource language as formulated for each condition [2, 3]. grammatical rules and literary work related to these languages The Statistical Based POS tagging approach is based on is not present in public domain. Pre-processing task like POS two important factors. These are: Frequency and probability tagging is a challenging task for these languages. In POS of occurrence of any word .In this approach most frequently tagging process a specific grammar class which is called as used tag for a specific word in the annotated training dataset is tag is assigned to a word in the sentence from tag set. Tag set used to tag that word in the un annotated dataset .The is a collection of grammar class which consist of English limitation of this system is that some sequences of tags can abbreviations like N(Noun),VM(Verb), PP(Preposition) come up for sentences that are not correct according to the etc.[1]. Parts of Speech (POS) tagging is a process of grammar rules of a certain language [3]. identifying the suitable class of tag for a word from a given tag In Hybrid Approach the probability theory of statistical set. It is very important task of pre-processing activity in method is used to train the corpus and then the set of machine translation. Machine translation systems take a production rules are applied on the testing corpus for tagging source language and convert it into target language. Various of testing corpus [2, 3]. POS tagging process is broadly tools are required in machine translation systems like classified into two models: Supervised Model and tokenize, POS tagger, morphological analyzer and parser. Unsupervised Model [4]. Classification of POS Tagging is POS tagger comes under pre-processing phase of machine shown in Figure 1. translation system. Most of the regional languages are low resources language. Some Indian languages are called low resource language as grammatical rules and literary work related to these languages is not present in public domain. Pre-processing task like POS tagging is a challenging task for these languages. In POS tagging process a specific grammar class which is called as tag to a word in the sentence from tag set. Tag set is a collection of grammar class which consist of English abbreviations like NN (Noun), VM (Verb), PP (Preposition) etc.[2]. Revised Version Manuscript Received on 30 November, 2018. Vikas Pandey, Dept. of Information Technology, Bhilai Institute of Figure 1: Classification of POS Tagging Technology, Durg, India Dr. M.V Padmavati, Dept. of Computer Science and Engg., Bhilai Institute of Technology, Durg, India Dr. Ramesh Kumar, Dept. of Computer Science and Engg., Bhilai Institute of Technology, Durg, India Published By: Blue Eyes Intelligence Engineering 192 & Sciences Publication Rule Based Parts of Speech Tagger for Chhattisgarhi Language II. LITERATURE SURVEY Rule 2: If word is relative pronoun then there is high probability that next word will be noun. Research has already been done in morphologically rich For Example:- Indian languages like Hindi, Bengali, Telugu, Marathi, Tamil, Urdu, Gujarati, Kannada, Malayalam, Odia and Punjabi. इही घर हरे जे ऱा तोखन बनाये हे । There are some low resource languages in India like Awadhi, In above example and is relative pronoun and Magahi, Nimadi, Bhojpuri, and Chhattisgarhi for which इही जेऱा घर machine translation tools have not been developed yet. and तोखन is noun. A POS tagger was developed using conditional random Rule 3: If word is reflexive pronoun then there is high field for Bengali language In this system contextual probability that next word will be noun. information of the words has been used to search different For Example:- POS tags for various tokenized words. The system was evaluated over a corpus of 72,341 words with 26 different ओ अऩन धनी सन चऱ ददस। POS tags and system achieved the accuracy of 90.3% [5]. In above example अऩन is reflexive pronoun and धनी is A POS tagger was developed using Hidden Markov Model noun. for Hindi, which uses a Naïve stemmer as a pre-processor Rule 4: If word is personal pronoun then there is high based on longest suffix matching algorithm to achieve probability that next word will be noun. accuracy of 93.12% [6]. For Example:- A POS tagger was developed using Hidden Markov Model for Assamese. Unknown words were tagged using simple ऐ मोर ऩुतक हे। morphological analysis .The system was evaluated over a In above example मोर is personal pronoun and ऩुतक is corpus of 10,000 words with 172 different POS tags and noun system achieved the accuracy of 87% [7]. Rule 5: If current word is post position then there is high A POS tagger was developed using Hidden Markov Model probability that previous word will be noun. for Hindi .They uses Indian language POS tag set to develop For Example:- this tagger and achieved the accuracy of 92% [8]. धनीराम रायऩुर म रथे। III. METHODOLOGY In above example रायऩुरis noun and मᴂ is post position. This system is developed using rule based approach and Rule 6: If current word is verb then there is probability that tagging will be done by the help of POS tag made in previous word will be noun. consultation with Chhattisgarhi linguistics expert. The system For Example:- mainly works in two steps-firstly the sentences are spitted into words by the help of line splitter program and input words are तोरण नहाये बर गे हे । found in the database; if it is present then rules are applied to In above example तोरण is noun and नहाय े is verb. tag to assign if a proper tag and if it is not found then UNK tag Rule 7: If word is noun then there is probability that next or will be assigned to it. previous word will be noun. 3.1 Algorithm For Example:- The algorithm used for rule based part of speech tagger for सुररवात बबऱासऩुर म ऩढ़ते । Chhattisgarhi language is as follows: In above example सररवात and बबऱासऩर both are noun. 1. Chhattisgarhi sentence is tokenized by the help of line ु ु splitter program. 3.2.2 Demonstrative Identification Rules 2. The Tokenized Chhattisgarhi words are normalized. Rule 1: If current word is pronoun in database and next 3. Search the numbers and tag them by using regular word is also pronoun, then first word will be demonstrative. expression. For Example:- 4. Abbreviations are searched using regular expression. 5. In the database all the input words are searched and ते कोन हरस । tag the word according to appropriate tag. In above example current word is कोन and next word is 6. UNK tag will be given to the unknown words. हरस and both are pronoun so त े is demonstrative. 7. The tagged words are then displayed. Rule 2: If current word is noun in database and next word 3.2 Rules that are Applied to Identify Different Tags is verb, then previous word will be demonstrative. 3.2.1 Noun Identification Rules For Example: - Rule 1: If word is adjective then there is high probability ओ घर चऱ ददस । that next word will be noun. For Example:- असऱी बात ये हरे। In above example असऱी is adjective बात is noun. Published By: Blue Eyes Intelligence Engineering 193 & Sciences Publication International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-7 Issue-4, November 2018 In above example current word is घर which is noun and next word is चऱ which is a verb, so ओ is demonstrative.
Recommended publications
  • Some Principles of the Use of Macro-Areas Language Dynamics &A
    Online Appendix for Harald Hammarstr¨om& Mark Donohue (2014) Some Principles of the Use of Macro-Areas Language Dynamics & Change Harald Hammarstr¨om& Mark Donohue The following document lists the languages of the world and their as- signment to the macro-areas described in the main body of the paper as well as the WALS macro-area for languages featured in the WALS 2005 edi- tion. 7160 languages are included, which represent all languages for which we had coordinates available1. Every language is given with its ISO-639-3 code (if it has one) for proper identification. The mapping between WALS languages and ISO-codes was done by using the mapping downloadable from the 2011 online WALS edition2 (because a number of errors in the mapping were corrected for the 2011 edition). 38 WALS languages are not given an ISO-code in the 2011 mapping, 36 of these have been assigned their appropri- ate iso-code based on the sources the WALS lists for the respective language. This was not possible for Tasmanian (WALS-code: tsm) because the WALS mixes data from very different Tasmanian languages and for Kualan (WALS- code: kua) because no source is given. 17 WALS-languages were assigned ISO-codes which have subsequently been retired { these have been assigned their appropriate updated ISO-code. In many cases, a WALS-language is mapped to several ISO-codes. As this has no bearing for the assignment to macro-areas, multiple mappings have been retained. 1There are another couple of hundred languages which are attested but for which our database currently lacks coordinates.
    [Show full text]
  • India Country Name India
    TOPONYMIC FACT FILE India Country name India State title in English Republic of India State title in official languages (Bhārat Gaṇarājya) (romanized in brackets) भारत गणरा煍य Name of citizen Indian Official languages Hindi, written in Devanagari script, and English1 Country name in official languages (Bhārat) (romanized in brackets) भारत Script Devanagari ISO-3166 code (alpha-2/alpha-3) IN/IND Capital New Delhi Population 1,210 million2 Introduction India occupies the greater part of South Asia. It was part of the British Empire from 1858 until 1947 when India was split along religious lines into two nations at independence: the Hindu-majority India and the Muslim-majority Pakistan. Its highly diverse population consists of thousands of ethnic groups and hundreds of languages. Northeast India comprises the states of Arunāchal Pradesh, Assam, Manipur, Meghālaya, Mizoram, Nāgāland, Sikkim and Tripura. It is connected to the rest of India through a narrow corridor of the state of West Bengal. It shares borders with the countries of Nepal, China, Bhutan, Myanmar (Burma) and Bangladesh. The mostly hilly and mountainous region is home to many hill tribes, with their own distinct languages and culture. Geographical names policy PCGN policy for India is to use the Roman-script geographical names found on official India-produced sources. Official maps are produced by the Survey of India primarily in Hindi and English (versions are also made in Odiya for Odisha state, Tamil for Tamil Nādu state and there is a Sanskrit version of the political map of the whole of India). The Survey of India is also responsible for the standardization of geographical names in India.
    [Show full text]
  • Minority Languages in India
    Thomas Benedikter Minority Languages in India An appraisal of the linguistic rights of minorities in India ---------------------------- EURASIA-Net Europe-South Asia Exchange on Supranational (Regional) Policies and Instruments for the Promotion of Human Rights and the Management of Minority Issues 2 Linguistic minorities in India An appraisal of the linguistic rights of minorities in India Bozen/Bolzano, March 2013 This study was originally written for the European Academy of Bolzano/Bozen (EURAC), Institute for Minority Rights, in the frame of the project Europe-South Asia Exchange on Supranational (Regional) Policies and Instruments for the Promotion of Human Rights and the Management of Minority Issues (EURASIA-Net). The publication is based on extensive research in eight Indian States, with the support of the European Academy of Bozen/Bolzano and the Mahanirban Calcutta Research Group, Kolkata. EURASIA-Net Partners Accademia Europea Bolzano/Europäische Akademie Bozen (EURAC) – Bolzano/Bozen (Italy) Brunel University – West London (UK) Johann Wolfgang Goethe-Universität – Frankfurt am Main (Germany) Mahanirban Calcutta Research Group (India) South Asian Forum for Human Rights (Nepal) Democratic Commission of Human Development (Pakistan), and University of Dhaka (Bangladesh) Edited by © Thomas Benedikter 2013 Rights and permissions Copying and/or transmitting parts of this work without prior permission, may be a violation of applicable law. The publishers encourage dissemination of this publication and would be happy to grant permission.
    [Show full text]
  • Ambiguity in Chhattisgarhi Language – Introduction and Prevention
    © 2021 JETIR July 2021, Volume 8, Issue 7 www.jetir.org (ISSN-2349-5162) Ambiguity in Chhattisgarhi language – Introduction and Prevention 1Pragya Bhagat, 2Dr.Chandrakant S. Ragit 1Department of information and language Engineering, Mahatma Gandhi Antarrashtriya Hindi Vishwavidyalaya, Wardha , Maharashtra, India 2 Pro-vice Chancellor and Director, Department of information and language Engineering, Mahatma Gandhi Antarrashtriya Hindi Vishwavidyalaya, Wardha , Maharashtra, India Abstract: We use sentences to express our sentiments. Sentences need to be grammatical in order to communicate. These grammatical sentences address the language. The use of words present in sentences to represent different situations is called ambiguity of the word. This research paper highlights the ambiguity and redressal of the words of natural language (Chhattisgarhi). In this research paper, we have compiled and analyzed data from various fields of Chhattisgarh language to understand the ambiguity present in Chhattisgarhi language and then highlighted their redressal. The main objective of this paper is to make aware of the ambiguity present in the Chhattisgarhi language and the rule based method used for its solution. This system is effective only for the prevention of word level ambiguity. Index Terms – ambiguity, Chhattisgarhi language, contextual rules, knowledge base I. INTRODUCTION All organisms express their dialogue through language. But the language used in dialog observation itself reveals many features. One of the different characteristics of the language is ambiguity. It is because of this characteristic of language that sometimes dialogue is also responsible for wrong transmission due to the multi meaning of the word present in the language. This feature of language is found in every natural language.
    [Show full text]
  • Chapter One: Social, Cultural and Linguistic Landscape of India
    Chapter one: Social, Cultural and Linguistic Landscape of India 1.1 Introduction: India also known as Bharat is the seventh largest country covering a land area of 32, 87,263 sq.km. It stretches 3,214 km. from North to South between the extreme latitudes and 2,933 km from East to West between the extreme longitudes. On this 2.4 % of earth‟s surface, lives 16% of world‟s population. With a population of 1,028,737,436 variations is there at every step of life. India is a land of bewildering diversity. India is bounded by the Indian Ocean on the Figure 1.1: India in World Population south, the Arabian Sea on the west and the Bay of Bengal on the east. Many outsiders explored India via these routes. The whole of India is divided into twenty eight states and seven union territories. Each state has its own cultural and linguistic peculiarities and diversities. This diversity can be seen in every aspect of Indian life. Whether it is culture, language, script, religion, food, clothing etc. makes ones identity multi-dimensional. Ones identity lies in his language, his culture, caste, state, village etc. So one can say India is a multi-centered nation. Indian multilingualism is unique in itself. It has been rightly said, “Each part of India is a kind of replica of the bigger cultural space called India.” (Singh U. N, 2009). Also multilingualism in India is not considered a barrier but a boon. 17 Chapter One: Social, Cultural and Linguistic Landscape of India Languages act as bridges because it enables us to know about others.
    [Show full text]
  • Hindi to Chhattisgarhi Translator
    www.ijcrt.org © 2018 IJCRT | Volume 6, Issue 1 March 2018 | ISSN: 2320-2882 Hindi to Chhattisgarhi Translator 1Shubham Kumar sahu, 2Anupriya dutta , 3Manish Kumar sinha, 4Sachin ther Department of Information Technology, Bhilai Institute of technology ,Durg ,Chhattisgarh , India Abstract: - A natural language translator translates one language into another with the help of certain logic, rules and algorithms. Our work will translate Hindi language to Chhattisgarhi and vice-versa. We will also need a POS tagger for this which we have already created earlier and for making this work correctly we have taken help from linguistic expert from a university in Chhattisgarh who teaches Chhattisgarhi there. We have used an online tool Lingojam to make possible our translator is completely online and tool based. Keywords:-Translator, Chhattisgarhi, Hindi, lingojam I. INTRODUCTION To make a translator first we need to understand the basic structure of sentence of both the languages, here the best part is the script of the both the languages is same that is Devanagari , and the structure of the sentence is also 70-80% is the same so we don’t need a language parser for making a translator here, and only few parts are there where structure of the sentence is not same , there the lingojam tools provide re-ordering and arranging of sentences but for that the words should be tagged and for that we can use a POS(parts of speech) tagger for Chhattisgarhi and Hindi language both. Our work is completely new and nothing has been done for Chhattisgarhi language 1.1 ABOUT LINGOJAM It’s an Open Source Toolkit for Statistical Machine Translation.it is developed by Australian company.
    [Show full text]
  • Iouo Iouo Iouo Iouo Iouo Iouo Iouo Iouo Iouo Iouo Iouo Iouo Iouo Iouo Iouo Iouo Iouo Iouo Iouo Iouo Iouo Iouo Iouo Iouo Iouo
    Asia No. Language [ISO 639-3 Code] Country (Region) 1 A’ou [aou] Iouo China 2 Abai Sungai [abf] Iouo Malaysia 3 Abaza [abq] Iouo Russia, Turkey 4 Abinomn [bsa] Iouo Indonesia 5 Abkhaz [abk] Iouo Georgia, Turkey 6 Abui [abz] Iouo Indonesia 7 Abun [kgr] Iouo Indonesia 8 Aceh [ace] Iouo Indonesia 9 Achang [acn] Iouo China, Myanmar 10 Ache [yif] Iouo China 11 Adabe [adb] Iouo East Timor 12 Adang [adn] Iouo Indonesia 13 Adasen [tiu] Iouo Philippines 14 Adi [adi] Iouo India 15 Adi, Galo [adl] Iouo India 16 Adonara [adr] Iouo Indonesia Iraq, Israel, Jordan, Russia, Syria, 17 Adyghe [ady] Iouo Turkey 18 Aer [aeq] Iouo Pakistan 19 Agariya [agi] Iouo India 20 Aghu [ahh] Iouo Indonesia 21 Aghul [agx] Iouo Russia 22 Agta, Alabat Island [dul] Iouo Philippines 23 Agta, Casiguran Dumagat [dgc] Iouo Philippines 24 Agta, Central Cagayan [agt] Iouo Philippines 25 Agta, Dupaninan [duo] Iouo Philippines 26 Agta, Isarog [agk] Iouo Philippines 27 Agta, Mt. Iraya [atl] Iouo Philippines 28 Agta, Mt. Iriga [agz] Iouo Philippines 29 Agta, Pahanan [apf] Iouo Philippines 30 Agta, Umiray Dumaget [due] Iouo Philippines 31 Agutaynen [agn] Iouo Philippines 32 Aheu [thm] Iouo Laos, Thailand 33 Ahirani [ahr] Iouo India 34 Ahom [aho] Iouo India 35 Ai-Cham [aih] Iouo China 36 Aimaq [aiq] Iouo Afghanistan, Iran 37 Aimol [aim] Iouo India 38 Ainu [aib] Iouo China 39 Ainu [ain] Iouo Japan 40 Airoran [air] Iouo Indonesia 1 Asia No. Language [ISO 639-3 Code] Country (Region) 41 Aiton [aio] Iouo India 42 Akeu [aeu] Iouo China, Laos, Myanmar, Thailand China, Laos, Myanmar, Thailand,
    [Show full text]
  • Text Pre-Processing and Parts of Speech Tagging for Kannada Language
    Journal of Xi'an University of Architecture & Technology Issn No : 1006-7930 Text pre-processing and parts of speech tagging for Kannada language Saritha Shetty Department of Master of Computer Applications NMAM Institute of Technology, Nitte, Karnataka, India Email- [email protected] Savitha Shetty Department of Computer Science and Engineering, NMAM Institute of Technology, Nitte, Karnataka, India Email- [email protected] Abstract - The technique of assigning various parts of speech for each word in a text file is referred as Parts of Speech Tagging. This paper propounds assigning of POS for Kannada Language words by Hidden Markov Model. This paper also focuses on Kannada Language detection and Text pre-processing. POS tagger has been developed using Python programming panguage. Tkinter is used as an interface. Data accumulation for training and testing of the system is done from wikipedia, Kannada e-papers. 18000 words are trained and they are tested with 1000 words. The contrast amidst project generated output and physically tagged data results in correctness of accuracy for POS tagging. The correctness of 95% is achieved from the experimental outcome of the proposed System. Keywords – Hidden Markov Model, Text pre-processing, Lemmatization, Stemming, POS tagging I. INTRODUCTION Language is a medium through which human beings communicate with each other. There are around 7000 languages. Kannada is a language that is preponderantly used by people of Karnataka for communication. It is one among the 1750 languages spoken in India. Over 43.7 million people are the native speakers of this language. There is a special place for Kannada literature in Indian literature.
    [Show full text]
  • World Events : How Shall We Then Live?
    APRIL 2016 Vol. 14, No. 7 IBT Founder (Late) Rev. Prof. S. Panneer Selvam WORD OF TRUTH - Annual Subscription : Rs. 100/- PUBLICATION OF INDIAN BIBLE TRANSLATORS Single Copy : Rs. 10/-. WORLD EVENTS : HOW SHALL WE THEN LIVE? INTRODUCTION not succeed. Many wars have since been fought and we have lived through those years Prophesy is part of God's law. Jesus says and we realize the invisible hand of God in Mt 5:17,18 "Do not think that I came to protecting Israel from her enemies. We read in destroy the Law and the Prophets. I did not Amos 9:13-15 how God plans to bless Israel. come to destroy but to fulfil. For assuredly I In fact, as late as Oct 1, 2013, the PM of say to you, till heaven and earth pass away, Israel addressing the UN ended with this one jot or one title will by no means pass quote : "In our time the Biblical prophecies are from the law till all is fulfilled." This simply being realized," Netanyahu noted. "As the means that any prophesy given by the Lord prophet Amos said, they shall rebuild ruined will be fulfilled and nothing can prevent it from cities and inhabit them. They shall plant being fulfilled. Over 360 OT prophecies have vineyards and drink their wine. They shall till been fulfilled in Jesus. God promised to bring gardens and eat their fruit. And I will plant the Jewish people back to the land of Israel them upon their soil never to be uprooted from all over the world.
    [Show full text]
  • The Indo-Aryan Languages: a Tour of the Hindi Belt: Bhojpuri, Magahi, Maithili
    1.2 East of the Hindi Belt The following languages are quite closely related: 24.956 ¯ Assamese (Assam) Topics in the Syntax of the Modern Indo-Aryan Languages February 7, 2003 ¯ Bengali (West Bengal, Tripura, Bangladesh) ¯ Or.iya (Orissa) ¯ Bishnupriya Manipuri This group of languages is also quite closely related to the ‘Bihari’ languages that are part 1 The Indo-Aryan Languages: a tour of the Hindi belt: Bhojpuri, Magahi, Maithili. ¯ sub-branch of the Indo-European family, spoken mainly in India, Pakistan, Bangladesh, Nepal, Sri Lanka, and the Maldive Islands by at least 640 million people (according to the 1.3 Central Indo-Aryan 1981 census). (Masica (1991)). ¯ Eastern Punjabi ¯ Together with the Iranian languages to the west (Persian, Kurdish, Dari, Pashto, Baluchi, Ormuri etc.) , the Indo-Aryan languages form the Indo-Iranian subgroup of the Indo- ¯ ‘Rajasthani’: Marwar.i, Mewar.i, Har.auti, Malvi etc. European family. ¯ ¯ Most of the subcontinent can be looked at as a dialect continuum. There seem to be no Bhil Languages: Bhili, Garasia, Rathawi, Wagdi etc. major geographical barriers to the movement of people in the subcontinent. ¯ Gujarati, Saurashtra 1.1 The Hindi Belt The Bhil languages occupy an area that abuts ‘Rajasthani’, Gujarati, and Marathi. They have several properties in common with the surrounding languages. According to the Ethnologue, in 1999, there were 491 million people who reported Hindi Central Indo-Aryan is also where Modern Standard Hindi fits in. as their first language, and 58 million people who reported Urdu as their first language. Some central Indo-Aryan languages are spoken far from the subcontinent.
    [Show full text]
  • Abstract of Speakers' Strength of Languages and Mother Tongues - 2011
    STATEMENT-1 ABSTRACT OF SPEAKERS' STRENGTH OF LANGUAGES AND MOTHER TONGUES - 2011 Presented below is an alphabetical abstract of languages and the mother tongues with speakers' strength of 10,000 and above at the all India level, grouped under each language. There are a total of 121 languages and 270 mother tongues. The 22 languages specified in the Eighth Schedule to the Constitution of India are given in Part A and languages other than those specified in the Eighth Schedule (numbering 99) are given in Part B. PART-A LANGUAGES SPECIFIED IN THE EIGHTH SCHEDULE (SCHEDULED LANGUAGES) Name of Language & mother tongue(s) Number of persons who Name of Language & mother tongue(s) Number of persons who grouped under each language returned the language (and grouped under each language returned the language (and the mother tongues the mother tongues grouped grouped under each) as under each) as their mother their mother tongue) tongue) 1 2 1 2 1 ASSAMESE 1,53,11,351 Gawari 19,062 Assamese 1,48,16,414 Gojri/Gujjari/Gujar 12,27,901 Others 4,94,937 Handuri 47,803 Hara/Harauti 29,44,356 2 BENGALI 9,72,37,669 Haryanvi 98,06,519 Bengali 9,61,77,835 Hindi 32,22,30,097 Chakma 2,28,281 Jaunpuri/Jaunsari 1,36,779 Haijong/Hajong 71,792 Kangri 11,17,342 Rajbangsi 4,75,861 Khari Boli 50,195 Others 2,83,900 Khortha/Khotta 80,38,735 Kulvi 1,96,295 3 BODO 14,82,929 Kumauni 20,81,057 Bodo 14,54,547 Kurmali Thar 3,11,175 Kachari 15,984 Lamani/Lambadi/Labani 32,76,548 Mech/Mechhia 11,546 Laria 89,876 Others 852 Lodhi 1,39,180 Magadhi/Magahi 1,27,06,825 4 DOGRI 25,96,767
    [Show full text]
  • Language Distinctiveness*
    RAI – data on language distinctiveness RAI data Language distinctiveness* Country profiles *This document provides data production information for the RAI-Rokkan dataset. Last edited on October 7, 2020 Compiled by Gary Marks with research assistance by Noah Dasanaike Citation: Liesbet Hooghe and Gary Marks (2016). Community, Scale and Regional Governance: A Postfunctionalist Theory of Governance, Vol. II. Oxford: OUP. Sarah Shair-Rosenfield, Arjan H. Schakel, Sara Niedzwiecki, Gary Marks, Liesbet Hooghe, Sandra Chapman-Osterkatz (2021). “Language difference and Regional Authority.” Regional and Federal Studies, Vol. 31. DOI: 10.1080/13597566.2020.1831476 Introduction ....................................................................................................................6 Albania ............................................................................................................................7 Argentina ...................................................................................................................... 10 Australia ....................................................................................................................... 12 Austria .......................................................................................................................... 14 Bahamas ....................................................................................................................... 16 Bangladesh ..................................................................................................................
    [Show full text]