AN ENGLISH to ASSAMESE, BENGALI and HINDI MULTILINGUAL E-DICTIONARY Md
Total Page:16
File Type:pdf, Size:1020Kb
AN ENGLISH TO ASSAMESE, BENGALI AND HINDI MULTILINGUAL E-DICTIONARY Md. Saiful Islam Department of Computer Science Assam University, Silchar, Assam, India E-mail:[email protected] Abstract alphabetically with their meaning, synonyms, Dictionary is a very demandable components phonetics, POS, and examples [5][6]. It is one of of Natural Language Processing system the important tools to assist students in nowadays. A dictionary is one of the understanding as well as enlightening the skill of important tools that can be used for learning reading. There are two types of dictionary, new languages. A word is basically an namely Paper dictionary which is also known as association of linguistic sound and meaning. hard or printed dictionary and Electronic The spelling does not always easily correlate dictionary which is also known as digital or with the sound of a word. A dictionary helps Internet dictionary. us both with the spelling and pronunciation of such words. Electronic dictionaries are very Electronic Dictionary (E-Dictionary) is one kind popular nowadays. It can be accessed by many of dictionary whose data exists in digital form users simultaneously on online. The main and can be accessed through a number of objective of this paper is to develop an English different media. The E-Dictionary is a very to Assamese, Bengali and Hindi (E-ABH) important and powerful tool for any person who multilingual electronic dictionary in such a is learning a new language using computer on way that it is user friendly dictionary and user both online and offline. It has the advantage of can easily look up the meaning of word and providing the user to access much larger database other related information of the word like than a single book. The most important word Id, POS, synonyms and examples from advantage of an E-Dictionary is that it is very English to Assamese, Bengali and Hindi convenient to use. In modern electronic form, languages. This dictionary will be beneficial electronic dictionaries have tremendous potential. and must be improved the knowledge of According to the languages involve, the Assamese, Bengali, English and Hindi dictionaries are found in three categories as languages basically for people of North-East below: India. Keywords: Electronic Dictionary, Languages, 1. Monolingual Dictionary: Here, user can Natural Language Processing, Sequential search the meaning of word and other related Search Technique information of the word from one language to same language. English-English and Bengali- I. INTRODUCTION Bengali are some of the examples of A. Electronic Dictionary monolingual dictionary. Dictionary is a book of words with one or more 2. Bilingual Dictionary: Here, user can search specific languages and the words are listed the meaning of word and other related ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-3, ISSUE-9, 2016 74 INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR) information of the word from one language to C. Languages another language. Assamese-English and In this section, we discuss briefly about the English-Bengali are some of the examples of Assamese, Bengali, English and Hindi languages bilingual dictionary. as follows: 3. Multilingual Dictionary: Here, user can search the meanings of words and other related 1. Assamese Language: Assamese is an information of the words from one language to Eastern Indo-Aryan language used mainly in several languages. English-Assamese, Bengali the state of Assam. It is the state language as and Hindi is an example of multilingual well as official language of Assam. The dictionary. Assamese language is also known as Asamiya (Axomiya). It is the mother tongue/language of According to Al-Rabi’i, the E-Dictionary can be Assamese people. Assamese language is divided into two different types [5] as follows: spoken mainly by the people of Assam and by the some people of other North-Eastern states. 1. Online E-Dictionary: This dictionary is Nearly 15 to 20 million people speak the directly used in digital form through Internet Assamese language. Assamese is one of the using web browsers from anywhere place in the recognized languages of India [6][7]. It is world. It is also known as Internet dictionary. evolved in the 7th century AD having its roots Many users can be accessed it simultaneously on from the Sanskrit language. However, its online. vocabulary, phonology and grammar have been substantially influenced by the original 2. Offline E-Dictionary: This dictionary can be inhabitants of Assam, such as the Boros and the used in digital computer, PDA (Personal Data Kacharis. Assamese script is derived from Assistant), and mobile phone. It is also known as Brahmi script. The Assamese language is portable digital dictionary. We can carry and written using Assamese scripts that are backup Offline E-Dictionary using CD, DVD, developed from the Gupta alphabets around HD and pen drive. We can also download this 1200 AD and which closely resemble the type of dictionary from Internet and can be Mithilakshar and Bengali alphabets. installed in our own computer or other devices. 2. Bengali Language: Bengali language is an B. Natural Language Processing Indo-Aryan language spoken mostly in the East Natural languages are most commonly used by Indian subcontinent. It is also known as Bangla humans for communication purposes naturally. language. It has evolved from the Magadhi Natural Language Processing (NLP) is a field of Prakrit and Sanskrit language. Bengali is one of computer science and linguistics concerned with the recognised languages of India. It is the the interactions between computers and natural official language of West Bengal and Tripura. It languages[4]. NLP deals with computer is also a major language in the Indian Union programs to understand human languages both in Territory of Andaman and Nicobar Islands. The written and oral form. The major goal of the NLP Bengali is mainly spoken by the people of Indian group is to design and build software that will states like West Bengal, Tripura and Assam. It is analyze, understand, and generate languages that the seventh most spoken language in the world humans use naturally. NLP is an area of research and second most spoken language in India. and application that explores how computer can The Bengali language is written using Bengali be used to understand and manipulate natural scripts and is the 6th most widely used writing language text or speech to do useful things. Some system in the world. The script with minor of the most common research tasks in NLP are variations is shared by Assamese and is the basis Machine Translation, Electronic Dictionary, for the other languages like Morphological Segmentation, Natural Language Manipuri and Bishnupriya Manipuri [6]. Generation, Optical Character Recognition, Part of Speech (POS) Tagging, Question Answering, 3. English Language: English is the West Speech Recognition, Information Retrieval (IR), Germanic language that was first spoken in early and Speech Segmentation[6]. medieval England. English is spoken mainly by the people of Canada, Australia, United Kingdom, United States, Ireland, and New ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-3, ISSUE-9, 2016 75 INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR) Zealand. It is an official language of almost sixty c. The Compact Oxford English Dictionary, sovereign states. It is the third most common edited by J. A. Simpson and E. S. C. Weiner native language in the world. It has become in 1991[15]. the leading language of international discourse d. The Oxford Dictionary of Current English, [6]. English was introduced in India in 1830 compiled by Catherine Soanes in 2006. during the rule of the East India Company. At the e. The Concise Oxford English Dictionary, time of Independence of India in 1947, English edited by Angus Stevenson and Maurice was the only functional lingua franca in the Waite in 2011 [16]. country. The Constitution of India (1951) declared English as the associate official III. DATAFLOW DIAGRAM OF E-ABH language of India. It has various dialects in India DICTIONARY due to the influence of local languages. A Data Flow Diagram (DFD) is a pictorial 4. Hindi Language: Hindi is the fourth most representation of information flows in a system. widely spoken language in the world. It is spoken The DFD is often used as a preliminary step to widely by the people of Indian states like Delhi, create an overview of the system [12]. It is an Madhya Pradesh, Bihar, Uttar attractive technique because it provides what Pradesh, Chhattisgarh, Haryana. Himachal users do rather than what computers do. The Pradesh, Chandigarh, and Rajasthan. It is the DFD technique is very popular, because it is very primary spoken language of Madhya Pradesh and simple to understand and use. We have used two Uttar Pradesh [6]. In the 2001 census of India, types of DFD to implement the E-ABH 258 million people is reported Hindi to be their dictionary which are as below: native language. Hindi is also spoken in the other neighbouring countries of India, such as A. Level 0 DFD Bangladesh, Bhutan and Nepal. Hindi derives its The Level 0 DFD is also known as Context vocabulary from several major sources like Diagram (CD). Sanskrit, Persian and Arabic. A CD is the most basic form of the DFD. It aims II. REVIEW OF RELATED LITERATURE to show how the entire system works at a glance. Lots of English paper dictionaries have been CD demonstrates the interactions between the compiled by many lexicographers in different process and external entities. The CD of E-ABH times. The first English dictionary was compiled dictionary is shown in figure1. by Robert Cawdrey in 1604 [17]. It contains about 2,543 words. The first electronic version of Oxford English Dictionary (OED) was made available in 1988 [14].