Assamese Dialect Translation System- a Preliminary Proposal
Total Page:16
File Type:pdf, Size:1020Kb
Assamese Dialect Translation System- A preliminary proposal Sanghamitra Nath, Himangshu Sarma and Utpal Sharma Department of Computer Science and Engineering, Tezpur University, Assam, India Email: [email protected] , [email protected] , [email protected] Abstract — Assamese, spoken by most natives of the state of neighboring states of West Bengal, Meghalaya, Arunachal Assam in the North East of India, is a language which has Pradesh and other North East Indian states 1. The Assamese originated from the Indo-European family. However the language grew out of Sanskrit, however, the original language when spoken in the different regions of Assam is seen inhabitants of Assam like the bodos and kacharis had a great to vary, mostly in terms of pronunciation, intonation and influence on its vocabulary, phonology and grammar. vocabulary. This has given birth to several regional dialects. Assamese and the cognate languages, Maithili, Bengali and Central Assamese is spoken in the Nagaon district of Assam Oriya, is believed to have developed from Magadhi Prakrit and in neighboring areas. Eastern Assamese is spoken in and which is the eastern branch of the Apabhramsa that followed around the district of Sibsagar. Kamrupia or Kamrupi is Prakrit. Mayang is the form of Assamese spoken by the spoken in the districts of Kamrup, Nalbari, Barpeta, Darrang, somewhat marginalized Mayang tribe in the northern regions Kokrajhar, and Bongaigoan. In addition we have Assamese of Manipur while Jharwa Assamese is a pidgin language variants like Nalbaria, Barpetia, Kamrupia, Goalparia, incorporating elements of Assamese, Hindi and English. Jorhatia, etc spoken by people of the respective regions. It Although in the middle of the 19th century, the official would be useful therefore if a system is designed to help two language of Assam was considered to be Bengali, British people speaking different varieties of Assamese to colonizers decreed Eastern Assamese to be the standard form communicate with each other. In other words, a seamless of the Assamese language. Presently, however, Central integration of a speech recognition module, a machine Assamese is accepted as the principal or standard dialect. translation module and a speech synthesis module would help Several regional dialects are typically recognized. These facilitate the exchange of information between two people dialects are seen to vary primarily with respect to phonology speaking two different dialects. The proposed system for and morphology. However a high degree of mutual Assamese Dialect translation and synthesis should therefore, intelligibility is shared by the dialects. Dr. Banikanta Kakati, an work in such a manner that given a sentence in standard eminent linguist of Assam, divided the Assamese dialects into Assamese (written form of the language ), it should translate two major groups, Eastern Assamese and Western Assamese. and synthesize the utterances in the dialect requested, or vice However, recent studies have shown that there are four major versa. Or given a sentence in a particular dialect, it should dialect groups listed below from east to west 2: translate and synthesize the utterances in the dialect requested. The outcome of this work is two folds. Firstly, it will help I. Eastern group spoken in and other districts around people understand and appreciate the interesting differences in Sibsagar district. the Assamese dialects, which is important in preserving the culture preserved in the dialects. Secondly, the proposed II. Central group spoken in present Nagaon district and system will act as an aid to people speaking different dialects adjoining areas. while communicating with each other. Kamrupi group spoken in undivided Kamrup, Nalbari, III. Barpeta, Darrang, Kokrajhar and Bongaigaon. Keywords— Assamese, Dialect, Speech Corpus, Prosody, Speech Synthesis IV. Goalparia group spoken in Goalpara, Dhubri, Kokrajhar and Bongaigaon districts I. INTRODUCTION Assamese is the principal language of the state of Assam in The script used by the Assamese language is a variant North East India and is regarded as the lingua-franca of the of the Eastern Nagari Script which traces its descent from whole of North East India. The Assamese language is the the Gupta Script. The script is similar to that of Bengali easternmost member of the Indo- European family and is except for the symbols for /r/ and /w/ and highly spoken by most natives of the state of Assam. As reported by RCILTS, IITG, over 15.3 million people speak Assamese as the first language and including those who speak it as a second 1 http://www.iitg.ernet.in/rcilts/pdf/assamese.pdf language, a total of 20 million speak Assamese primarily in the 2 http://en.wikipedia.org/wiki/Assamese_language northeastern state of Assam and in some parts of the resembles the Devanagiri script of Hindi, Sanskrit and other prosodic information play a vital role in the development of related Indic languages. It is a syllabary script and is written Speech corpus [2]. from left to right. The Assamese alphabet consists of 12 vowel graphemes and 52 consonant graphemes 3. Both phonemes and Phones or Syllables as the basic unit allophones are represented in this set of alphabets. Assamese Phones as basic speech units reduce the size of the database spelling is not always phonetically based. Current Assamese because the number of phones for most Indian languages is spelling practices are based on Sanskrit spelling, as introduced well below 50. But phones provide very less co-articulation in Hemkosh, the second Assamese dictionary written in the information across adjacent units thus failing to model the middle of the 19th century by Hemchandra Baruah which in dynamics of speech sound. They therefore, are not considered fact is considered as the standard reference of the Assamese to be efficient in speech synthesis. Most Indian languages language. however are syllable centered, with the pronunciations mainly based on syllables. Intelligible speech synthesis is possible for Indian languages with syllable as the basic unit [2]. Syllable II. BACKGROUND units are larger than phones or diphones and can capture co- Though much work has been carried out in the field of articulation information much better than phones or diphones. Speech translation and synthesis, work related to dialect Also, the number of concatenation points decreases when translation and synthesis specially in the Assamese dialects is syllable is used as the basic unit. Syllable boundaries are near to none. As an initial measure, a survey has been made in characterized by regions of low energy, providing more the areas of pronunciation modeling, speech recognition, prosodic information. A grapheme (letter or a number of letters translation and other current speech processing approaches that represent a phoneme) in Indian languages is close to a which will be required for the proposed system/work. syllable. The general format of an Indian language syllable is C*VC*, where C is a consonant, V is a vowel and C* indicates A. Building of Speech Corpus the presence of 0 or more consonants. A total of about 35 A spoken language system, whether it is a speech consonants and 18 vowels exist in Indian languages. There are recognition system or a speech synthesis system, always starts defined set of syllabification rules formed by researchers, to with the building of speech corpora [2]. Therefore to begin produce computationally reasonable syllables. Such rules are with, speech samples need to be recorded and recognized to bound to vary with respect to a language or there may be generate a text file. Then a corresponding speech file with the specific rules particular to a language. A rule based grapheme phonetic representation of the recorded speech is generated and to syllable converter can also be used for syllabification [2]. stored. The speech file may be processed in different levels Polysyllables as the basic unit viz., paragraphs, sentences, phrases, words, syllables, phones and diphones and these units of processing are referred to as Polysyllable units are formed using the monosyllable units the speech units of the file. Concatenative speech synthesis already present in the database, therefore the quality of involves the concatenation of these basic units to synthesize synthesis can be improved without the requirement of a new natural sounding speech. The speech units are added with some set of units. Such a system uses a large database consisting of more relevant information about each unit using acoustic and syllables, bisyllables and trisyllables. In the synthesizing prosodic parameters either manually or automatically. Acoustic process, the first matching trisyllable is selected followed by parameters may be linear prediction coefficients, formants, the bisyllable and monosyllable units, as needed. Selecting the pitch and gain while prosodic parameters may be duration, largest possible unit in the database reduces the number of co- intensity or pitch. These parameters are modified by stored articulation points thereby improving the quality of knowledge sources corresponding to coarticulation, duration and intonation [10]. The coarticulation rules specify the pattern synthesized speech [2]. of joining the basic units while the duration rules modify the Clustering the speech units inherent duration of the basic units based on the linguistic context in which the units occur. The intonation rules specify Speech unit clustering leads to selection of the best candidates the overall pitch contour for the utterance, the fall-rise patterns, for synthesis. An acoustic distance measure is defined to resetting phenomena and inherent fundamental frequency of measure the distance between two units of the same phone vowels. To make the speech more intelligible and natural, type. Cluster units of the same type are formed by evaluating appropriate pauses need to be specified between the syntactic various factors concerning prosodic and phonetic context. A units. The resulting database consisting of the speech units decision tree is then built based on questions concerning the along with their associated information is called the speech phonetic and prosodic aspects of the grapheme.