Assamese Dialect Translation System- a Preliminary Proposal

Total Page:16

File Type:pdf, Size:1020Kb

Assamese Dialect Translation System- a Preliminary Proposal Assamese Dialect Translation System- A preliminary proposal Sanghamitra Nath, Himangshu Sarma and Utpal Sharma Department of Computer Science and Engineering, Tezpur University, Assam, India Email: [email protected] , [email protected] , [email protected] Abstract — Assamese, spoken by most natives of the state of neighboring states of West Bengal, Meghalaya, Arunachal Assam in the North East of India, is a language which has Pradesh and other North East Indian states 1. The Assamese originated from the Indo-European family. However the language grew out of Sanskrit, however, the original language when spoken in the different regions of Assam is seen inhabitants of Assam like the bodos and kacharis had a great to vary, mostly in terms of pronunciation, intonation and influence on its vocabulary, phonology and grammar. vocabulary. This has given birth to several regional dialects. Assamese and the cognate languages, Maithili, Bengali and Central Assamese is spoken in the Nagaon district of Assam Oriya, is believed to have developed from Magadhi Prakrit and in neighboring areas. Eastern Assamese is spoken in and which is the eastern branch of the Apabhramsa that followed around the district of Sibsagar. Kamrupia or Kamrupi is Prakrit. Mayang is the form of Assamese spoken by the spoken in the districts of Kamrup, Nalbari, Barpeta, Darrang, somewhat marginalized Mayang tribe in the northern regions Kokrajhar, and Bongaigoan. In addition we have Assamese of Manipur while Jharwa Assamese is a pidgin language variants like Nalbaria, Barpetia, Kamrupia, Goalparia, incorporating elements of Assamese, Hindi and English. Jorhatia, etc spoken by people of the respective regions. It Although in the middle of the 19th century, the official would be useful therefore if a system is designed to help two language of Assam was considered to be Bengali, British people speaking different varieties of Assamese to colonizers decreed Eastern Assamese to be the standard form communicate with each other. In other words, a seamless of the Assamese language. Presently, however, Central integration of a speech recognition module, a machine Assamese is accepted as the principal or standard dialect. translation module and a speech synthesis module would help Several regional dialects are typically recognized. These facilitate the exchange of information between two people dialects are seen to vary primarily with respect to phonology speaking two different dialects. The proposed system for and morphology. However a high degree of mutual Assamese Dialect translation and synthesis should therefore, intelligibility is shared by the dialects. Dr. Banikanta Kakati, an work in such a manner that given a sentence in standard eminent linguist of Assam, divided the Assamese dialects into Assamese (written form of the language ), it should translate two major groups, Eastern Assamese and Western Assamese. and synthesize the utterances in the dialect requested, or vice However, recent studies have shown that there are four major versa. Or given a sentence in a particular dialect, it should dialect groups listed below from east to west 2: translate and synthesize the utterances in the dialect requested. The outcome of this work is two folds. Firstly, it will help I. Eastern group spoken in and other districts around people understand and appreciate the interesting differences in Sibsagar district. the Assamese dialects, which is important in preserving the culture preserved in the dialects. Secondly, the proposed II. Central group spoken in present Nagaon district and system will act as an aid to people speaking different dialects adjoining areas. while communicating with each other. Kamrupi group spoken in undivided Kamrup, Nalbari, III. Barpeta, Darrang, Kokrajhar and Bongaigaon. Keywords— Assamese, Dialect, Speech Corpus, Prosody, Speech Synthesis IV. Goalparia group spoken in Goalpara, Dhubri, Kokrajhar and Bongaigaon districts I. INTRODUCTION Assamese is the principal language of the state of Assam in The script used by the Assamese language is a variant North East India and is regarded as the lingua-franca of the of the Eastern Nagari Script which traces its descent from whole of North East India. The Assamese language is the the Gupta Script. The script is similar to that of Bengali easternmost member of the Indo- European family and is except for the symbols for /r/ and /w/ and highly spoken by most natives of the state of Assam. As reported by RCILTS, IITG, over 15.3 million people speak Assamese as the first language and including those who speak it as a second 1 http://www.iitg.ernet.in/rcilts/pdf/assamese.pdf language, a total of 20 million speak Assamese primarily in the 2 http://en.wikipedia.org/wiki/Assamese_language northeastern state of Assam and in some parts of the resembles the Devanagiri script of Hindi, Sanskrit and other prosodic information play a vital role in the development of related Indic languages. It is a syllabary script and is written Speech corpus [2]. from left to right. The Assamese alphabet consists of 12 vowel graphemes and 52 consonant graphemes 3. Both phonemes and Phones or Syllables as the basic unit allophones are represented in this set of alphabets. Assamese Phones as basic speech units reduce the size of the database spelling is not always phonetically based. Current Assamese because the number of phones for most Indian languages is spelling practices are based on Sanskrit spelling, as introduced well below 50. But phones provide very less co-articulation in Hemkosh, the second Assamese dictionary written in the information across adjacent units thus failing to model the middle of the 19th century by Hemchandra Baruah which in dynamics of speech sound. They therefore, are not considered fact is considered as the standard reference of the Assamese to be efficient in speech synthesis. Most Indian languages language. however are syllable centered, with the pronunciations mainly based on syllables. Intelligible speech synthesis is possible for Indian languages with syllable as the basic unit [2]. Syllable II. BACKGROUND units are larger than phones or diphones and can capture co- Though much work has been carried out in the field of articulation information much better than phones or diphones. Speech translation and synthesis, work related to dialect Also, the number of concatenation points decreases when translation and synthesis specially in the Assamese dialects is syllable is used as the basic unit. Syllable boundaries are near to none. As an initial measure, a survey has been made in characterized by regions of low energy, providing more the areas of pronunciation modeling, speech recognition, prosodic information. A grapheme (letter or a number of letters translation and other current speech processing approaches that represent a phoneme) in Indian languages is close to a which will be required for the proposed system/work. syllable. The general format of an Indian language syllable is C*VC*, where C is a consonant, V is a vowel and C* indicates A. Building of Speech Corpus the presence of 0 or more consonants. A total of about 35 A spoken language system, whether it is a speech consonants and 18 vowels exist in Indian languages. There are recognition system or a speech synthesis system, always starts defined set of syllabification rules formed by researchers, to with the building of speech corpora [2]. Therefore to begin produce computationally reasonable syllables. Such rules are with, speech samples need to be recorded and recognized to bound to vary with respect to a language or there may be generate a text file. Then a corresponding speech file with the specific rules particular to a language. A rule based grapheme phonetic representation of the recorded speech is generated and to syllable converter can also be used for syllabification [2]. stored. The speech file may be processed in different levels Polysyllables as the basic unit viz., paragraphs, sentences, phrases, words, syllables, phones and diphones and these units of processing are referred to as Polysyllable units are formed using the monosyllable units the speech units of the file. Concatenative speech synthesis already present in the database, therefore the quality of involves the concatenation of these basic units to synthesize synthesis can be improved without the requirement of a new natural sounding speech. The speech units are added with some set of units. Such a system uses a large database consisting of more relevant information about each unit using acoustic and syllables, bisyllables and trisyllables. In the synthesizing prosodic parameters either manually or automatically. Acoustic process, the first matching trisyllable is selected followed by parameters may be linear prediction coefficients, formants, the bisyllable and monosyllable units, as needed. Selecting the pitch and gain while prosodic parameters may be duration, largest possible unit in the database reduces the number of co- intensity or pitch. These parameters are modified by stored articulation points thereby improving the quality of knowledge sources corresponding to coarticulation, duration and intonation [10]. The coarticulation rules specify the pattern synthesized speech [2]. of joining the basic units while the duration rules modify the Clustering the speech units inherent duration of the basic units based on the linguistic context in which the units occur. The intonation rules specify Speech unit clustering leads to selection of the best candidates the overall pitch contour for the utterance, the fall-rise patterns, for synthesis. An acoustic distance measure is defined to resetting phenomena and inherent fundamental frequency of measure the distance between two units of the same phone vowels. To make the speech more intelligible and natural, type. Cluster units of the same type are formed by evaluating appropriate pauses need to be specified between the syntactic various factors concerning prosodic and phonetic context. A units. The resulting database consisting of the speech units decision tree is then built based on questions concerning the along with their associated information is called the speech phonetic and prosodic aspects of the grapheme.
Recommended publications
  • The Written Examination Will Consist of the Following Papers :— Qualifying Papers
    MAIN EXAMINATION: The written examination will consist of the following papers :— Qualifying Papers : Paper-A (One of the Indian Language to be selected by the candidate from the Languages included in the Eighth Schedule to the Constitution). 300 Marks Paper-B English 300 Marks Papers to be counted for merit Paper-I Essay 250 Marks Paper-II General Studies-I 250 Marks (Indian Heritage and Culture, History and Geography of the World and Society) Paper-III General Studies -II 250 Marks (Governance, Constitution, Polity, Social Justice and International relations) Paper-IV General Studies -III 250 Marks (Technology, Economic Development, Bio-diversity, Environment, Security and Disaster Management) Paper-V General Studies -IV 250 Marks (Ethics, Integrity and Aptitude) Paper-VI Optional Subject - Paper 1 250 Marks Paper-VII Optional Subject - Paper 2 250 Marks Sub Total (Written test) 1750 Marks Personality Test 275 Marks Grand Total 2025 Marks Candidates may choose any one of the optional subjects from amongst the list of subjects Government strives to have a workforce which reflects gender balance and women candidates are encouraged to apply. given in para 2 below:— NOTE : The papers on Indian languages and English (Paper A and paper B) will be of Matriculation or equivalent standard and will be of qualifying nature. The marks obtained in these papers will not be counted for ranking. (i) Evaluation of the papers, namely, 'Essay', 'General Studies' and Optional Subject of all the candidates would be done simultaneously along with evaluation of their qualifying papers on ‘Indian Languages’ and ‘English’ but the papers on Essay, General Studies and Optional Subject of only such candidates will be taken cognizance who attain 25% marks in ‘Indian Language’ and 25% in English as minimum qualifying standards in these qualifying papers.
    [Show full text]
  • Languages of New York State Is Designed As a Resource for All Education Professionals, but with Particular Consideration to Those Who Work with Bilingual1 Students
    TTHE LLANGUAGES OF NNEW YYORK SSTATE:: A CUNY-NYSIEB GUIDE FOR EDUCATORS LUISANGELYN MOLINA, GRADE 9 ALEXANDER FFUNK This guide was developed by CUNY-NYSIEB, a collaborative project of the Research Institute for the Study of Language in Urban Society (RISLUS) and the Ph.D. Program in Urban Education at the Graduate Center, The City University of New York, and funded by the New York State Education Department. The guide was written under the direction of CUNY-NYSIEB's Project Director, Nelson Flores, and the Principal Investigators of the project: Ricardo Otheguy, Ofelia García and Kate Menken. For more information about CUNY-NYSIEB, visit www.cuny-nysieb.org. Published in 2012 by CUNY-NYSIEB, The Graduate Center, The City University of New York, 365 Fifth Avenue, NY, NY 10016. [email protected]. ABOUT THE AUTHOR Alexander Funk has a Bachelor of Arts in music and English from Yale University, and is a doctoral student in linguistics at the CUNY Graduate Center, where his theoretical research focuses on the semantics and syntax of a phenomenon known as ‘non-intersective modification.’ He has taught for several years in the Department of English at Hunter College and the Department of Linguistics and Communications Disorders at Queens College, and has served on the research staff for the Long-Term English Language Learner Project headed by Kate Menken, as well as on the development team for CUNY’s nascent Institute for Language Education in Transcultural Context. Prior to his graduate studies, Mr. Funk worked for nearly a decade in education: as an ESL instructor and teacher trainer in New York City, and as a gym, math and English teacher in Barcelona.
    [Show full text]
  • Socio-Political Movements in North Bengal (A Sub-Himalayan Tract) Edited by Publish by Global Vision Publishing House Sukhbilas Barma
    Socio-Political Movements in North Bengal (A Sub-Himalayan Tract) Edited by Publish by Global Vision Publishing House Sukhbilas Barma Kamata Language— A Brilliant Past and Tragic End Dharma Narayan Barma Long before the epic ages, North-east India was infested with different tribes namely Austric, Dravidians and Mongoloid people. The Austric people entered this region from Australia through south-eastern direction of India, Dravidians from west and Mongolians from China through North-eastern passes. And lately came the Aryans from Mid- India. In the Ramayana, we find that Naraka the foster son of king Janaka of Mithila entered Pragjotishpur and dethroned Ghataka, the Kirat king. According to Kalikapurana, Naraka on his coronation brought in many Aryans from Mithila and made them settled there permanently. Naraka was a unique warrior who demolished the neighbouring Kirat kingdoms and established an empire in Pragjyotishpur. Because of his heroic nature and might, he could easily sustain the wrath of the neighbouring tribal kings for which he was known as Asura. The appellation, ‘Asura’ does not mean demon; Rig Veda has clearly said that the term ‘Asura’ means warrior, and great hero. Dharma Narayan Barma: A retired teacher of Tufanganj High School. 212 Socio-Political Movements in North Bengal Kamrupa, an Ancient Settlement of the Aryans After Naraka, his son the famous king Vagadatta of Mahabharata, sided with the Kauravas in the Kurukshetra war and gave his daughter Bhanumati to marriage with Duryodhana, the Kaurava king. In these ways, Aryans spread towards the east (Pragjyotispura) in those days. In the 4th century B.C.
    [Show full text]
  • Empire's Garden: Assam and the Making of India
    A book in the series Radical Perspectives a radical history review book series Series editors: Daniel J. Walkowitz, New York University Barbara Weinstein, New York University History, as radical historians have long observed, cannot be severed from authorial subjectivity, indeed from politics. Political concerns animate the questions we ask, the subjects on which we write. For over thirty years the Radical History Review has led in nurturing and advancing politically engaged historical research. Radical Perspec- tives seeks to further the journal’s mission: any author wishing to be in the series makes a self-conscious decision to associate her or his work with a radical perspective. To be sure, many of us are currently struggling with the issue of what it means to be a radical historian in the early twenty-first century, and this series is intended to provide some signposts for what we would judge to be radical history. It will o√er innovative ways of telling stories from multiple perspectives; comparative, transnational, and global histories that transcend con- ventional boundaries of region and nation; works that elaborate on the implications of the postcolonial move to ‘‘provincialize Eu- rope’’; studies of the public in and of the past, including those that consider the commodification of the past; histories that explore the intersection of identities such as gender, race, class and sexuality with an eye to their political implications and complications. Above all, this book series seeks to create an important intellectual space and discursive community to explore the very issue of what con- stitutes radical history. Within this context, some of the books pub- lished in the series may privilege alternative and oppositional politi- cal cultures, but all will be concerned with the way power is con- stituted, contested, used, and abused.
    [Show full text]
  • Phonological Analysis of Assamese Vowel
    © 2019 JETIR June 2019, Volume 6, Issue 6 www.jetir.org (ISSN-2349-5162) PHONOLOGICAL ANALYSIS OF ASSAMESE VOWEL Jitu Borah Assistant Professor, Department of Assamese, Assam Women’s University, Rowriah, Jorhat. Abstract : The North-East India has often been described as a precious store-house of data for linguistic research. A large number of languages of this region belonging to three major language-families of the world, viz., Indo-Aryan, Sino-Tibetan and Austric. Assamese is a state language of Assam spoken by the Assamese people. Assamese language has three distinct dialects which are Eastern Assamese dialect, Kamrupi dialect and Goalpariya dialect. Assamese language has own Script Known as Assamese Script was developed from Brahmi script. This research paper studies the Phonological Analysis of the vowels of Assamese Language, the frequency of the vowels, length of the vowels, formant, pitch, intensity of the Assamese vowels. Index Terms - Assamese, Vowel, Acoustic & Phonology. I. INTRODUCTION The North-East India has often been described as a precious store-house of data for linguistic research. A large number of languages of this region belonging to three major language-families of the world, viz., Indo-Aryan, Sino-Tibetan and Austric. Assamese is an Eastern Indo-Aryan language spoken mainly in the Indian state of Assam, where it is an official language. It is the easternmost Indo-European language, spoken by over 15 million speakers and serves as a lingua franca in the region. It has its own identity with special characteristics for thousand of years. Its formative period begins from the tenth century AD and written records in verse date back to the late Fourteen century AD, ‘Prahlad Charita’ by Hema Saraswati being the earliest one.
    [Show full text]
  • Ethnographically Oriented Repository of Assamese Telephonic Speech
    MATEC Web of Conferences 210, 05019 (2018) https://doi.org/10.1051/matecconf/201821005019 CSCC 2018 Ethnographically Oriented Repository of Assamese Telephonic Speech Mridusmita Sharma1,*, Kandarpa Kumar Sarma2, and Nikos E. Mastorakis1 1Department of Electronics and Communication Engineering, Gauhati University, Guwahati-781014, Assam, India 2Department of Electronics and Communication Engineering, Gauhati University, Guwahati-781014, Assam, India 3Technical University of Sofia, Sofia, Kliment Ohridski 8, Bulgaria Abstract. Recording of the speech samples is the first step in speech recognition and related tasks. For English, there are a bunch of readily available data sets. But standard data sets with regional dialect and mood variations are not available and the need to create our own data set for our experimental works has been faced. We have considered Assamese language for our case study and since it is less computationally aware, there is a need to develop the speech corpus having dialect and mood variations. Also, the development of corpus is an ongoing process and the initial task is reported in this paper. 1 INTRODUCTION designing system for real time telephone speech recognition with better accuracy with regional linguistic The primary objective of designing an efficient orientation. One such challenge is the design of a good Automatic Speech Recognition (ASR) system lies in its speech corpus which has regional dialectal and mood superior performance over the already existing models variations. A balanced corpus is the basic requirement [1]. The performance evaluation of an ASR system for designing a mood or dialect or speaker recognition greatly depends upon the speech database which is the system.
    [Show full text]
  • Modelling South Kamrupi Dialect of Assamese Language Using
    ISSN (Online) : 2454 -7190 ISSN (Print) 0973-8975 Copyright reserved © J.Mech.Cont.& Math. Sci., Vol.-14, No.2, March-April (2019) pp 521-540 Modelling South Kamrupi Dialect of Assamese Language using HTK 1Ranjan Das, 2Uzzal Sharma 1Research Scholar, Department of Computer Science & Engineering, School of Engineering, Assam Don Bosco University, Azara, Guwahati, Assam, India 781017 2Assistant Professor, Department of Computer Science & Engineering, School of Engineering, Assam Don Bosco University, Azara, Guwahati, Assam, India 781017 [email protected], [email protected] Corresponding Author: Ranjan Das Email: [email protected] https://doi.org/10.26782/jmcms.2019.04.00038 Abstract This paper addresses the fundamental issues of developing a speaker independent, dialect modelling system for recognizing the widely spoken, colloquial South Kamrupi dialect of Assamese language. The proposed dialect model is basically designed on Hidden Markov Model (HMM). Hidden Markov Model Toolkit (HTK) is used here as the building block for feature extraction, training, recognition and verification for the model building process. A primary corpus is built as a prerequisite for the empirical study. Altogether, 16 people (9 male, 7 female) are volunteering in the primary corpora building process. The corpora are comprised of one training and two testing sets of recorded speech files. The whole corpora are made up of around 2.5 hours of recordings. The proposed dialect model is trained on South Kamrupi dialect training corpora. A comparative test recognition is carefully designed and carried out which exhibit a recognition correctness of 87.13% for South Kamrupi dialect and 68.52% correctness for the Central Kamrupi dialect.
    [Show full text]
  • Chapter-II MEDIAVAL ASSAM
    Chapter-II MEDIAVAL ASSAM: A BRIEF STUDY 2.1: Location Assam, the beautiful land of the nature, the frontier province of India on the North-East, the boundaries of which lie between latitudes 28°18’ and 240 North and longriudes 89°46’ and 97°4’ East. It contains at present an area of 54,000 square miles, of which a littleover 24,000 square miles constitute the plains districts, 19,500 the southern hill tracts and the rest, the trial hill tracts to the north. On three sides the province is shut in by great mountain ranges, inhabited by people mostly of Mongolian stock. To the north lie the Himalayan regions of Bhutan and Tibet. Below the high mountains is a range of sub-Himalayan hills, inhabited to the west by small races of Bhutia origin, and east-ward by Tibeto-Burman tribes, Akas, Daflas, Miris, Abros and Mishmis. To the north-east lie the Mishmi Rhills, curving round the head of the Brahmaputra Valley. ‘Withreference to these northern frontier tracts, it is noteworthy that the international boundary between Assam and Tibet has not been clearly defined. However, in 1914 a tentative agreement was reached. Embodying a line on the map called the McMahon line. Continuing to the east is the Patkai Range, which defines the western boundary of Ava, the intervening ranges being inhabited chiefly by various tribes of Nagas and the native state of Manipur. ____________________________ 1. Robert Reid: The excluded area ofAssam, G.J.CIII, PP—18, Mills: The Assam Burma Frontier, Idid, LX VIII, pp-289.
    [Show full text]
  • Identification of Dialects: Survey
    International Journal of Advanced Science and Technology Vol. 29, No. 3, (2020), pp. 10733 - 10739 ` Identification of Dialects: Survey S. Shivaprasad 1 Dr. M Sadanandam2 1 Research Scholar , Department of CSE, Kakatiya University ,Warangal & Assistant Professor Department of CSE ,VFSTR University, GUNTUR. 2Assistant Professor,Department of CSE , Kakatiya University ,Warangal 1 [email protected] ABSTRACT Automatic Dialect Identification plays a crucial role for constructing an Automatic speech recognition system in an appreciable manner in signal processing. We can mention dialect as property of a language that varies from standard version of that language depending upon the region. Dialect can be identified from speaker’s vocabulary, articulation, grammar and some other aspects like loudness, tonality and nasality. Identifying dialect exactly and properly will help in making some applications and services to work in an efficient manner such as e-learning and many such fields that is more helpful for homebound, aged ones. Dealing with dialect identification is very difficult due to factors like insufficient databases, subtle to regional boundaries, variations in languages. It is a tedious analysis procedure. Due to this factor dialect identification became crucial among research topics. Through this paper, we explain the process that is performed in identification various dialects and also the work done up to now, thereby providing information on what might be expected in coming years. Keywords: Dialects, Automatic Dialect Identification (ADI), Speech recognition Systems (SRS), automatic speech recognition ( ASR) 1. INTRODUCTION Language acts a medium of communication for humans. In speech recognition systems we use same language for various purposes. Dialect of language is considered as one of the problem for automatic speech recognition systems.
    [Show full text]
  • Usage of the Assamese Language in Assam: Dialectal Varieties Vis-A-Vis the Standard Language
    Quest Journals Journal of Research in Humanities and Social Science Volume 8 ~ Issue 1 (2020)pp.: 18-21 ISSN(Online):2321-9467 www.questjournals.org Research Paper Usage of the Assamese Language in Assam: Dialectal Varieties Vis-A-Vis the Standard Language Meghali Kalita, Dr.Baishalee Rajkhowa Department of English Royal Global University, Assam Corresponding author: Meghali Kalita ABSTRACT: Assamese or Asamiya or Axomiya is an Eastern Indo-Aryan language used mainly in the state of Assam, where it is an official language. Assamese of Old Indo- Aryan dialects, though the exact nature of its origin and growth in not clear yet. The Standard dialect which is considered as the Standard Assamese Language is an Indo-Aryan language designated as the official language of Assam, is a state located in the North- eastern part of India. In Assam, Standard Assamese is the formal written language used for education, media and other official purpose, while Kamrupi and Goalparia are the informal spoken dialects spoken in the Kamrup and Goalpara districts of Assam, respectively. The importance of Kamrupi dialect can be understood from early literature, wherein Kamrupi was documented as the first ancient Aryan literary language spoken both in the Brahmaputra valley of Assam and North Bengal. This stud yis a comparative study between both the varieties. It also explores the usage of dialectal variety found in Kamrupi and Sibsagar dialects of Assam. Another area of interest is to find out whether the Kamrupi dialect is used in speaking or both for speaking and writing purposes in Assam. KEY WORDS: Standard Language, Dialectal varieties, usage, speaking, writing Received 25 Feb.2020; Accepted 05 Mar., 2020 © The author(s) 2020.
    [Show full text]
  • Assamese and the Unicode
    FOR VIEWING THIS DOCUMENT PROPERLY STAY ONLINE ASSAMESE AND THE UNICODE INTRODUCTION This piece of writing seeks to clear up the whole issue of the position of the Assamese people, their language and its system of writing in the world, more particularly in the Unicode Standard, keeping in mind all the parties involved in the issue including the Unicode Consortium. All the facts, historical, socialPHUKAN'S, political and technical which are necessary and essential to be discussed are being discussed openly, even if it may seem unpleasant to some and unnecessary or irrelevant and boring to some other, nothing in the dark, nothing to hide. The Unicode Consortium, a non-Governmental body with headquarters in the U.S.A, with Governments of countries as members have standardised a Universal Character Set (UCS), i.e. a standard that defines, in one place, all the characters needed for writing the majority of living languages in use on computers. It aims to be, and to a large extent already is, a superset of all other character sets that have been encoded. Unicode (as the UCS is commonly referred to) can access over a million characters of which about 100,000WEBPAGES have already been defined. These include characters for all the world's main languages along with a selection of symbols for various purposes. Assamese SATYAKAMis one of the language recognized and listed in the 8th Schedule of the Constitution of India. In the list the language is on the top. In all currency notes the denomination of the currency valued is written first in Assamese, the name being DRalphabetically on the top of the sorting order.
    [Show full text]
  • Human Language Computing in Indian Languages - a Holistic Perspective
    Human Language Computing in Indian Languages - A Holistic Perspective Swaran Lata Country Manager , W3C India Director & Head , TDIL Programme , Dept of Informaon Technology , Govt.of India E-mail : [email protected] 1 Organization of presentation: • Languages of India and its distribution • Technology Development for Indian Languages Programme • Phases of TDIL Programme • Paradigm Shift –Consortium mode projects • Linguistic Resources developed • Standardization Efforts - Core - Linguistic Resources • Testing and Evaluation Initiatives • Possible Collaborations with EU Programme • Future Directions 2 Languages of India ……INDIA: A Primer • Total Population: INDIA 1,028,737,436 (Source: STATES: 28 Census of India 2001) 10 States 03 UTs UT: 07 01 State 01 States • Language’s (Percentage to total population) 01 State 01 State HINDI (41.03) 01 State GUJARATI 02 States 02 UTs (4.48) BENGALI MARATHI (8.11) (6.99) 01 States MANIPURI 01 State (0.14) (3.21) (3.21) MALAYALAM MALAYALAM TELUGU 02 States (7.19) 01 State 01 State 01 States 01 UTs 01 UTs 01 State Linguistic Scenario in India Source – Census 2001, India Language Speakers Percentage to State(s) total population Assamese 13,168,484 1.28 Assam Bengali 83,369,769 8.11 Andaman & Nicobar Islands, Assam, Tripura, West Bengal Bodo 1,350,478 0.13 Assam Dogri 2,282,589 0.22 Jammu and Kashmir Gujarati 46,091,617 4.48 Dadra and Nagar Haveli, Daman and Diu, Gujarat Hindi 422,048,642 41.03 Andaman and Nicobar Islands, Arunachal Pradesh, Bihar, Chandigarh, Chhattisgarh, Delhi, Haryana, Himachal Pradesh, Jharkhand, Madhya Pradesh, Rajasthan, Uttar Pradesh and Uttarakhand Kannada 37,924,011 3.69 Karnataka.
    [Show full text]