Dialect Identification of Assamese Language Using Spectral Features

Total Page:16

File Type:pdf, Size:1020Kb

Dialect Identification of Assamese Language Using Spectral Features ISSN (Print) : 0974-6846 Indian Journal of Science and Technology, Vol 10(20), DOI: 10.17485/ijst/2017/v10i20/115033, May 2017 ISSN (Online) : 0974-5645 Dialect Identification of Assamese Language using Spectral Features Tanvira Ismail* and L. Joyprakash Singh Department of Electronics and Communication, School of Technology, North-Eastern Hill University, NEHU Campus, Shillong – 793022, Meghalaya, India; [email protected] Abstract Objective: telemedicine Accurate which isdialect especially identification important technique for older helps and homebound in improving people. the speech Methods: recognition In this systemspaper we that have exist developed in most of the present day electronic devices and is also expected to help in providing new services in the field of e-health and Indian language) and two of its dialects, viz., Kamrupi and Goalparia. Findings: the speech corpora needed for the dialect identification purpose and described a method to identify Assamese language (an Research work done on dialect identification tois relativelyextract the much spectral less thanfeatures that ofon thelanguage collected identification speech data. for Two which modeling one of the techniques, reasons being namely, dearth Gaussian of sufficient Mixture database Model andon dialects. Gaussian As Mixture mentioned, Model we with have Universal developed Background the database Model and havethen Mel-Frequencybeen used as the Cepstral modeling Coefficient techniques has tobeen identify used the dialects and language. Novelty: So far, standard speech database for Assamese dialects that can be used for speech processing research has not been made. In this paper, we not only describe a method to identify dialects of the Assamese language, but have also developed the speech corpora needed for the dialect identification purpose. Keywords: Assamese, Gaussian Mixture Model, Gaussian Mixture Model with Universal Background Model, Goalparia, Kamrupi, Mel-Frequency Cepstral Coefficient 1. Introduction • Improving certain applications and services such as Speech Recognition Systems which exist in Dialect is defined as a variety of a language spoken in a most of the today’s electronic devices, particular geographical area that is distinguished from • Enhancing the human - computer interaction other varieties of the same language by features of pro- applications and nunciation, grammar and vocabulary. It is also defined as • Securing the remote access communication.2 a variety of speech that differs from the standard language normally designated as the official language. Moreover, accurate dialect detection technique is Humans are born with the ability to discriminate expected to help in providing new services in the field of between spoken languages as part of human intelligence e-health and telemedicine which is especially important and the quest to automate such ability has never stopped.1 for older and homebound people.2 Just like any other artificial intelligence technologies, auto- Although much research work has been done in matic dialect identification aims to replicate such human language identification, the problem of dialect identifi- ability through computational means.1 Dialect identifica- cation, which is very similar to the problem of language tion is the task of recognizing a speaker’s regional dialect identification, has not received the same level of research within a predetermined language. interest.3 Research in the field of dialect is still limited Developing a good method to detect dialect accu- due to the dearth of databases and the time consuming rately helps in analysis process.4 Initial work using Gaussian Mixture *Author for correspondence Dialect Identification of Assamese Language using Spectral Features Models (GMM) was performed for the identification Split and Merge Expectation Maximization Algorithm of twelve languages and dialect identification was also was tested on four Indian languages, viz, Hindi, Telugu, performed for three out of the twelve languages, viz., Gujarati and English.10 Four Indian languages, viz, Hindi, English, Mandarin and Spanish.3 Identification capabil- Bengali, Oriya and Telugu were identified by consider- ity was improved by using Gaussian Mixture Model with ing the special CV (Consonant-Vowel) behavior of the Universal Background Model (GMM-UBM) and it was language in their syllables and were also analyzed using showed that a GMM-UBM based model provides good the SVM classifier.11 Work was also done on language results for recognition of American vs. Indian English, identification that was speaker dependent and was tested four Chinese dialects and three Arabic dialects.5 on three Indian languages, viz., Assamese, Hindi and In the field of Indian languages, Mel-Frequency Indian English, based on clustering and supervised learn- Cepstral Coefficient (MFCC) and Speech Signal based ing.12 First the feature vectors using LP coefficients were Frequency Cepstral Coefficient (SFCC) feature extrac- obtained and clusters of vectors using the K-means algo- tion techniques along with GMM and Support Vector rithm were formed. Supervised learning was then used Machine (SVM) modeling techniques was used to for recognizing the probable cluster to which the test identify the two main language families of India, viz., speech sample belongs.12 Indo-Aryan and Dravidian, which in total consisted of 22 As far as Indian dialects are concerned, both spec- languages.6 The Indo-Aryan family consisted of 18 lan- tral features and prosodic features were used to identify guages, viz., Assamese, Bengali, Bhojpuri, Chhattisgarhi, five Hindi dialects, viz., Chattisgharhi, Bengali, Marathi, Dogri, English, Gujarati, Hindi, Kashmiri, Konkani, General and Telugu using Autoassociative Neural Manipuri, Marathi, Nagamese, Odia, Punjabi, Sanskrit, Network (AANN) models.13 For each dialect, their data- Sindhi and Urdu, while Dravidian family consisted of 4 base consisted of data from 10 speakers speaking in languages, viz., Kannada, Malayalam, Tamil and Telugu.6 spontaneous speech for about 5-10 minutes resulting A language identification system with two levels that used in a total of 1-1.5 hours.13 On the other hand, speaker acoustic features, was modeled using Hidden Markov identification of specific dialect, viz., Assamese was done Model (HMM), GMM and Artificial Neural Network using features obtained from various speaker dependent (ANN) and was tested on nine Indian languages, viz., parameters of voiced speech and then ANN based classi- Tamil, Telugu, Kannada, Malayalam, Hindi, Bengali, fiers were used for identification purpose.14 Marathi, Gujarati and Punjabi.7 In this two level language Thus, with regard to Indian languages vis-à-vis Indian identification system, the family of the spoken language dialects also, the dialects have not been explored as is identified in the first level and after feeding this input much as Indian languages. Therefore, the present focus to the second level the identification of a particular lan- of our research is on Indian dialects identification and guage is made in the corresponding family.7 On the other in this paper we have chosen for our study the dialects hand, both spectral features and prosodic features were of Assamese language. Assamese has mainly three dia- used for analyzing the specific information with regards lects, viz., the standard dialect, Kamrupi and Goalparia to each language present in speech and then GMM was dialects.15 The standard dialect is the standard Assamese applied for identification purposes on the Indian language language which is an Indo-Aryan language designated as speech database (IITKGP-MLILSC) which consists of as the official language of Assam state located in the north- many as 27 Indian languages.8 Concentrating more on eastern part of India. In Assam, Assamese is the formal South Indian languages (languages of Dravidian family), written language used for education, media and other a Language Identification (LID) system was presented official purposes, while Kamrupi and Goalparia are the that worked for the four South Indian languages, viz., informal spoken dialects spoken in the Kamrup and Kannada, Malayalam, Tamil and Telugu and one North Goalparia districts of Assam, respectively. The impor- Indian language, viz., Hindi in which each language was tance of Kamrupi dialect can be understood from the modeled using an approach based on Vector Quantization, early literature, wherein Kamrupi was documented as the whereas the speech was segmented into dierrant sounds first ancient Aryan literary language spoken both in the and the performance of the system on each of the seg- Brahmaputra valley of Assam and North Bengal.16-18 On ments was studied.9 An LID system using GMM for the the other hand, the importance of Goalparia, which is also features that were extracted was further modeled using an Indo-Aryan dialect, is evident from the fact that it has 2 Vol 10 (20) | May 2017 | www.indjst.org Indian Journal of Science and Technology Tanvira Ismail and L. Joyprakash Singh a rich culture of folk music known as Goalparia Lokgeet. 2.1 Development of Speech Corpora According to the 2011 Census of India, there are about 20, While building a speech corpus, the important criteria to 6 and 1 million speakers of Assamese language, Kamrupi be kept in mind are: dialect and Goalparia dialect, respectively. • Enough speech must be recorded from enough So far, standard speech database for Assamese dia- speakers in order to validate an experiment lects that
Recommended publications
  • Numbers in Bengali Language
    NUMBERS IN BENGALI LANGUAGE A dissertation submitted to Assam University, Silchar in partial fulfilment of the requirement for the degree of Masters of Arts in Department of Linguistics. Roll - 011818 No - 2083100012 Registration No 03-120032252 DEPARTMENT OF LINGUISTICS SCHOOL OF LANGUAGE ASSAM UNIVERSITY SILCHAR 788011, INDIA YEAR OF SUBMISSION : 2020 CONTENTS Title Page no. Certificate 1 Declaration by the candidate 2 Acknowledgement 3 Chapter 1: INTRODUCTION 1.1.0 A rapid sketch on Assam 4 1.2.0 Etymology of “Assam” 4 Geographical Location 4-5 State symbols 5 Bengali language and scripts 5-6 Religion 6-9 Culture 9 Festival 9 Food havits 10 Dresses and Ornaments 10-12 Music and Instruments 12-14 Chapter 2: REVIEW OF LITERATURE 15-16 Chapter 3: OBJECTIVES AND METHODOLOGY Objectives 16 Methodology and Sources of Data 16 Chapter 4: NUMBERS 18-20 Chapter 5: CONCLUSION 21 BIBLIOGRAPHY 22 CERTIFICATE DEPARTMENT OF LINGUISTICS SCHOOL OF LANGUAGES ASSAM UNIVERSITY SILCHAR DATE: 15-05-2020 Certified that the dissertation/project entitled “Numbers in Bengali Language” submitted by Roll - 011818 No - 2083100012 Registration No 03-120032252 of 2018-2019 for Master degree in Linguistics in Assam University, Silchar. It is further certified that the candidate has complied with all the formalities as per the requirements of Assam University . I recommend that the dissertation may be placed before examiners for consideration of award of the degree of this university. 5.10.2020 (Asst. Professor Paramita Purkait) Name & Signature of the Supervisor Department of Linguistics Assam University, Silchar 1 DECLARATION I hereby Roll - 011818 No - 2083100012 Registration No – 03-120032252 hereby declare that the subject matter of the dissertation entitled ‘Numbers in Bengali language’ is the record of the work done by me.
    [Show full text]
  • The Written Examination Will Consist of the Following Papers :— Qualifying Papers
    MAIN EXAMINATION: The written examination will consist of the following papers :— Qualifying Papers : Paper-A (One of the Indian Language to be selected by the candidate from the Languages included in the Eighth Schedule to the Constitution). 300 Marks Paper-B English 300 Marks Papers to be counted for merit Paper-I Essay 250 Marks Paper-II General Studies-I 250 Marks (Indian Heritage and Culture, History and Geography of the World and Society) Paper-III General Studies -II 250 Marks (Governance, Constitution, Polity, Social Justice and International relations) Paper-IV General Studies -III 250 Marks (Technology, Economic Development, Bio-diversity, Environment, Security and Disaster Management) Paper-V General Studies -IV 250 Marks (Ethics, Integrity and Aptitude) Paper-VI Optional Subject - Paper 1 250 Marks Paper-VII Optional Subject - Paper 2 250 Marks Sub Total (Written test) 1750 Marks Personality Test 275 Marks Grand Total 2025 Marks Candidates may choose any one of the optional subjects from amongst the list of subjects Government strives to have a workforce which reflects gender balance and women candidates are encouraged to apply. given in para 2 below:— NOTE : The papers on Indian languages and English (Paper A and paper B) will be of Matriculation or equivalent standard and will be of qualifying nature. The marks obtained in these papers will not be counted for ranking. (i) Evaluation of the papers, namely, 'Essay', 'General Studies' and Optional Subject of all the candidates would be done simultaneously along with evaluation of their qualifying papers on ‘Indian Languages’ and ‘English’ but the papers on Essay, General Studies and Optional Subject of only such candidates will be taken cognizance who attain 25% marks in ‘Indian Language’ and 25% in English as minimum qualifying standards in these qualifying papers.
    [Show full text]
  • Class-8 New 2020.CDR
    Class - VIII AGRICULTURE OF ASSAM Agriculture forms the backbone of the economy of Assam. About 65 % of the total working force is engaged in agriculture and allied activities. It is observed that about half of the total income of the state of Assam comes from the agricultural sector. Fig 2.1: Pictures showing agricultural practices in Assam MAIN FEATURES OF AGRICULTURE Assam has a mere 2.4 % of the land area of India, yet supports more than 2.6 % of the population of India. The physical features including soil, rainfall and temperature in Assam in general are suitable for cultivation of paddy crops which occupies 65 % of the total cropped area. The other crops are wheat, pulses and oil seeds. Major cash crops are tea, jute, sugarcane, mesta and horticulture crops. Some of the crops like rice, wheat, oil seeds, tea , fruits etc provide raw material for some local industries such as rice milling, flour milling, oil pressing, tea manufacturing, jute industry and fruit preservation and canning industries.. Thus agriculture provides livelihood to a large population of Assam. AGRICULTURE AND LAND USE For the purpose of land utilization, the areas of Assam are divided under ten headings namely forest, land put to non-agricultural uses, barren and uncultivable land, permanent pastures and other grazing land, cultivable waste land, current fallow, other than current fallow net sown area and area sown more than once. 72 Fig 2.2: Major crops and their distribution The state is delineated into six broad agro-climatic regions namely upper north bank Brahmaputra valley, upper south bank Brahmaputra valley, Central Assam valley, Lower Assam valley, Barak plain and the hilly region.
    [Show full text]
  • AN ENGLISH to ASSAMESE, BENGALI and HINDI MULTILINGUAL E-DICTIONARY Md
    AN ENGLISH TO ASSAMESE, BENGALI AND HINDI MULTILINGUAL E-DICTIONARY Md. Saiful Islam Department of Computer Science Assam University, Silchar, Assam, India E-mail:[email protected] Abstract alphabetically with their meaning, synonyms, Dictionary is a very demandable components phonetics, POS, and examples [5][6]. It is one of of Natural Language Processing system the important tools to assist students in nowadays. A dictionary is one of the understanding as well as enlightening the skill of important tools that can be used for learning reading. There are two types of dictionary, new languages. A word is basically an namely Paper dictionary which is also known as association of linguistic sound and meaning. hard or printed dictionary and Electronic The spelling does not always easily correlate dictionary which is also known as digital or with the sound of a word. A dictionary helps Internet dictionary. us both with the spelling and pronunciation of such words. Electronic dictionaries are very Electronic Dictionary (E-Dictionary) is one kind popular nowadays. It can be accessed by many of dictionary whose data exists in digital form users simultaneously on online. The main and can be accessed through a number of objective of this paper is to develop an English different media. The E-Dictionary is a very to Assamese, Bengali and Hindi (E-ABH) important and powerful tool for any person who multilingual electronic dictionary in such a is learning a new language using computer on way that it is user friendly dictionary and user both online and offline. It has the advantage of can easily look up the meaning of word and providing the user to access much larger database other related information of the word like than a single book.
    [Show full text]
  • Reception of Words Through Translation in Boro Language
    January 2018, Volume 5, Issue 1 JETIR (ISSN-2349-5162) RECEPTION OF WORDS THROUGH TRANSLATION IN BORO LANGUAGE Apurba Kumar Baro Asstt. Professor Department of Bodo, B.H. College, Howly, Assam, India Abstract: This paper aims at investigating the reception of words through translation process available in Boro language. Boro language has enormous number of basic words. Besides these basic words in this language words have been coined through various sources. It has been observed that Boro has adapted a lot of constructed words via translation process based on morphological and semantic point of view. Fulfillment of the needs of language is one of the reasons that have been constructed of words through translation process from different sources. Keywords: Adaptation, translation, hybridization, clipping, discourse 1. INTRODUCTION Linguistically Boro language is belonging to the Tibeto-Burman branch of the greater Sino-Tibetan family of languages. At present majority of Boro linguistic speakers are mainly found in B.T.A.D. (Bodoland Territorial Area District). Boro language is a scheduled language of Indian constitution and is a language of both used in spoken and written form. The Boros have a lot of inherited words from Tibeto-Burman origin. But these basic words of TB origin are not enough to fulfill the need of academic and literary purposes of the language. As a result a lot of words have been constructed and coined based on the Boro structure. It is observed that such coining words have been constructed based on contend, semantic function, and morphological nature of the words. In this regard constructed of the words through translation process in Boro language is remarkable.
    [Show full text]
  • Neo-Vernacularization of South Asian Languages
    LLanguageanguage EEndangermentndangerment andand PPreservationreservation inin SSouthouth AAsiasia ed. by Hugo C. Cardoso Language Documentation & Conservation Special Publication No. 7 Language Endangerment and Preservation in South Asia ed. by Hugo C. Cardoso Language Documentation & Conservation Special Publication No. 7 PUBLISHED AS A SPECIAL PUBLICATION OF LANGUAGE DOCUMENTATION & CONSERVATION LANGUAGE ENDANGERMENT AND PRESERVATION IN SOUTH ASIA Special Publication No. 7 (January 2014) ed. by Hugo C. Cardoso LANGUAGE DOCUMENTATION & CONSERVATION Department of Linguistics, UHM Moore Hall 569 1890 East-West Road Honolulu, Hawai’i 96822 USA http:/nflrc.hawaii.edu/ldc UNIVERSITY OF HAWAI’I PRESS 2840 Kolowalu Street Honolulu, Hawai’i 96822-1888 USA © All text and images are copyright to the authors, 2014 Licensed under Creative Commons Attribution Non-Commercial No Derivatives License ISBN 978-0-9856211-4-8 http://hdl.handle.net/10125/4607 Contents Contributors iii Foreword 1 Hugo C. Cardoso 1 Death by other means: Neo-vernacularization of South Asian 3 languages E. Annamalai 2 Majority language death 19 Liudmila V. Khokhlova 3 Ahom and Tangsa: Case studies of language maintenance and 46 loss in North East India Stephen Morey 4 Script as a potential demarcator and stabilizer of languages in 78 South Asia Carmen Brandt 5 The lifecycle of Sri Lanka Malay 100 Umberto Ansaldo & Lisa Lim LANGUAGE ENDANGERMENT AND PRESERVATION IN SOUTH ASIA iii CONTRIBUTORS E. ANNAMALAI ([email protected]) is director emeritus of the Central Institute of Indian Languages, Mysore (India). He was chair of Terralingua, a non-profit organization to promote bi-cultural diversity and a panel member of the Endangered Languages Documentation Project, London.
    [Show full text]
  • Languages of New York State Is Designed As a Resource for All Education Professionals, but with Particular Consideration to Those Who Work with Bilingual1 Students
    TTHE LLANGUAGES OF NNEW YYORK SSTATE:: A CUNY-NYSIEB GUIDE FOR EDUCATORS LUISANGELYN MOLINA, GRADE 9 ALEXANDER FFUNK This guide was developed by CUNY-NYSIEB, a collaborative project of the Research Institute for the Study of Language in Urban Society (RISLUS) and the Ph.D. Program in Urban Education at the Graduate Center, The City University of New York, and funded by the New York State Education Department. The guide was written under the direction of CUNY-NYSIEB's Project Director, Nelson Flores, and the Principal Investigators of the project: Ricardo Otheguy, Ofelia García and Kate Menken. For more information about CUNY-NYSIEB, visit www.cuny-nysieb.org. Published in 2012 by CUNY-NYSIEB, The Graduate Center, The City University of New York, 365 Fifth Avenue, NY, NY 10016. [email protected]. ABOUT THE AUTHOR Alexander Funk has a Bachelor of Arts in music and English from Yale University, and is a doctoral student in linguistics at the CUNY Graduate Center, where his theoretical research focuses on the semantics and syntax of a phenomenon known as ‘non-intersective modification.’ He has taught for several years in the Department of English at Hunter College and the Department of Linguistics and Communications Disorders at Queens College, and has served on the research staff for the Long-Term English Language Learner Project headed by Kate Menken, as well as on the development team for CUNY’s nascent Institute for Language Education in Transcultural Context. Prior to his graduate studies, Mr. Funk worked for nearly a decade in education: as an ESL instructor and teacher trainer in New York City, and as a gym, math and English teacher in Barcelona.
    [Show full text]
  • A Comparative Study of Classifier Expression in Assamese and Bengali Languages
    IOSR Journal Of Humanities And Social Science (IOSR-JHSS) Volume 23, Issue 3, Ver. 8 (March. 2018) PP 01-10 e-ISSN: 2279-0837, p-ISSN: 2279-0845. www.iosrjournals.org A Comparative Study of Classifier Expression in Assamese and Bengali Languages Dr. Paramita Purkait Department of Linguistics Assam University Silchar Abstract: Classifiers are affixes that are used in various languages to indicate the grammatical or semantic classification of words. "Classifiers are generally defined as morphemes that classify and quantify nouns according to semantic criteria. Classifiers classify a noun inherently. They designate and specify semantic features inherent to the nominal denotatum and divide the set of nouns of a certain language into disjunct classes "(Senft 2000: 21). Classifiers system are found in the languages of Asia,Oceania,Australia,Africa and America.Among the New Indo Aryan languages Bengali, Assamese ,Oriya, Maithili and Marathi and among Munda family Santhali, Kurka and Malto made limited use of classifiers.Assamese although an Indo Aryan, like other Tibeto Burman languages - Boro, Garo,Rabha,Dimasa,Kokborok makes use of large number of classifiers for almost everything or for every shape. The present paper makes an attempt to compare and contrast the occurances and the use of classifiers among the sister language-Assamese and Bengali.Although, both Assamese and Bengali have SOV word order but they share some common and uncommon facts about classifiers.The study is confined to standard Bengali and standard Assamese. Keywords: Numeral
    [Show full text]
  • Socio-Political Movements in North Bengal (A Sub-Himalayan Tract) Edited by Publish by Global Vision Publishing House Sukhbilas Barma
    Socio-Political Movements in North Bengal (A Sub-Himalayan Tract) Edited by Publish by Global Vision Publishing House Sukhbilas Barma Kamata Language— A Brilliant Past and Tragic End Dharma Narayan Barma Long before the epic ages, North-east India was infested with different tribes namely Austric, Dravidians and Mongoloid people. The Austric people entered this region from Australia through south-eastern direction of India, Dravidians from west and Mongolians from China through North-eastern passes. And lately came the Aryans from Mid- India. In the Ramayana, we find that Naraka the foster son of king Janaka of Mithila entered Pragjotishpur and dethroned Ghataka, the Kirat king. According to Kalikapurana, Naraka on his coronation brought in many Aryans from Mithila and made them settled there permanently. Naraka was a unique warrior who demolished the neighbouring Kirat kingdoms and established an empire in Pragjyotishpur. Because of his heroic nature and might, he could easily sustain the wrath of the neighbouring tribal kings for which he was known as Asura. The appellation, ‘Asura’ does not mean demon; Rig Veda has clearly said that the term ‘Asura’ means warrior, and great hero. Dharma Narayan Barma: A retired teacher of Tufanganj High School. 212 Socio-Political Movements in North Bengal Kamrupa, an Ancient Settlement of the Aryans After Naraka, his son the famous king Vagadatta of Mahabharata, sided with the Kauravas in the Kurukshetra war and gave his daughter Bhanumati to marriage with Duryodhana, the Kaurava king. In these ways, Aryans spread towards the east (Pragjyotispura) in those days. In the 4th century B.C.
    [Show full text]
  • Empire's Garden: Assam and the Making of India
    A book in the series Radical Perspectives a radical history review book series Series editors: Daniel J. Walkowitz, New York University Barbara Weinstein, New York University History, as radical historians have long observed, cannot be severed from authorial subjectivity, indeed from politics. Political concerns animate the questions we ask, the subjects on which we write. For over thirty years the Radical History Review has led in nurturing and advancing politically engaged historical research. Radical Perspec- tives seeks to further the journal’s mission: any author wishing to be in the series makes a self-conscious decision to associate her or his work with a radical perspective. To be sure, many of us are currently struggling with the issue of what it means to be a radical historian in the early twenty-first century, and this series is intended to provide some signposts for what we would judge to be radical history. It will o√er innovative ways of telling stories from multiple perspectives; comparative, transnational, and global histories that transcend con- ventional boundaries of region and nation; works that elaborate on the implications of the postcolonial move to ‘‘provincialize Eu- rope’’; studies of the public in and of the past, including those that consider the commodification of the past; histories that explore the intersection of identities such as gender, race, class and sexuality with an eye to their political implications and complications. Above all, this book series seeks to create an important intellectual space and discursive community to explore the very issue of what con- stitutes radical history. Within this context, some of the books pub- lished in the series may privilege alternative and oppositional politi- cal cultures, but all will be concerned with the way power is con- stituted, contested, used, and abused.
    [Show full text]
  • Automatic Syllabification Rules for ASSAMESE Language
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Directory of Open Access Journals Laba Kr. Thakuria et al Int. Journal of Engineering Research and Applications www.ijera.com ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.446-450 RESEARCH ARTICLE OPEN ACCESS Automatic Syllabification Rules for ASSAMESE Language Laba Kr. Thakuria1, Prof. P.H. Talukdar2 Department of Instrumentation & USIC, Gauhati University Guwahati, India Department of Instrumentation & USIC, Gauhati University Guwahati, India Abstract For unit selection based text-to-speech system, syllabification acts as a backbone. Based on different structures of different languages syllabification rules are also varies. The purpose of this study is to examine and analyse the syllabification rules for Assamese language. Imparting education and training, preferably, in the local/regional language is urgently necessary in today’s context in order to maintain social harmony and homogeneity. Language heterogeneity is a global problem in bringing all the benefits of Information Technology (IT) to our doorsteps. Syllabification rules are implemented into an algorithm which later can be integrated into a text-to-speech system. The analysis of these rules has been taken using 10000 phonetically rich words which reports to produce a comparable result of 99% accuracy as compared to manual syllabification. Keywords: Assamese language, diphthongs, phonemes, syllable, text-to-speech. I. Introduction II. Assamese Language And Its Syllable is a unit of sound which is larger Phonological Structure than phoneme and smaller than a word [5]. Assamese is an Indo-Aryan language spoken Syllabification algorithms are mainly used in text-to- by the Assamese people in general.
    [Show full text]
  • A Review on Electronic Dictionary and Machine Translation System Developed in North-East India
    ORIENTAL JOURNAL OF ISSN: 0974-6471 COMPUTER SCIENCE & TECHNOLOGY June 2017, An International Open Free Access, Peer Reviewed Research Journal Vol. 10, No. (2): Published By: Oriental Scientific Publishing Co., India. Pgs. 429-437 www.computerscijournal.org A Review on Electronic Dictionary and Machine Translation System Developed in North-East India SAIFUL ISLAM* and BIPUL SYAM PURKAYASTHA Department of Computer Science, Assam University, Silchar, PIN-788011, Assam, India Corresponding author e-mail: [email protected] http://dx.doi.org/10.13005/ojcst/10.02.25 (Received: May 04, 2017; Accepted: May 12, 2017) ABSTRACT Electronic Dictionary and Machine Translation system are both the most important language learning tools to achieve the knowledge about the known and unknown natural languages. The natural languages are the most important aspect in human life for communication. Therefore, these two tools are very important and frequently used in human daily life. The Electronic Dictionary (E-dictionary) and Machine Translation (MT) systems are specially very helpful for students, research scholars, teachers, travellers and businessman. The E-dictionary and MT are very important applications and research tasks in Natural Language Processing (NLP). The demand of research task in E-dictionary and MT system are growing in the world as well as in India. North-East (NE) is a very popular and multilingual region of India. Even then, a small number of E-dictionary and MT system have been developed for NE languages. Through this paper, we want to elaborate about the importance, approaches and features of E-dictionary and MT system. This paper also tries to review about the existing E-dictionary and MT system which are developed for NE languages in NE India.
    [Show full text]