Conversion of Text Image Document in Southern Indian Languages Into Braille for Visually Challenged People
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Computer Engineering & Technology (IJCET) Volume 9, Issue 4, July-Aug 2018, pp. 73–84, Article IJCET_09_04_008 Available online at http://iaeme.com/Home/issue/IJCET?Volume=9&Issue=4 Journal Impact Factor (2016): 9.3590(Calculated by GISI) www.jifactor.com ISSN Print: 0976-6367 and ISSN Online: 0976–6375 © IAEME Publication CONVERSION OF TEXT IMAGE DOCUMENT IN SOUTHERN INDIAN LANGUAGES INTO BRAILLE FOR VISUALLY CHALLENGED PEOPLE Dr. G.Gayathri Devi Department of Computer Science, SDNB Vaishnav College for Women, Chennai, TamilNadu, India G.Sathyanarayanan Senior Professional Project Management, DXC Technology, ABSTRACT Speech and text is the significant intermediate for human communication. The Braille encoding system represents textual documents in a readable format for the visually challenged person. This paper projects a model to assist visually impaired or blind person in reading the text present on southern Indian Language text image by converting the text into corresponding Braille script. The experimentation of the algorithms was carried out on southern Indian Language text image dataset. Experimental results show that the projected method has a good performance on converting the extracted text regions in an image to corresponding Braille Script. Key words: Dravidian Language, Southern Indian Language, Braille conversion, Braille System, Visually challenged people, Image Processing. Cite this Article: Dr. G.Gayathri Devi and G.Sathyanarayanan, Conversion of Text Image Document In Southern Indian Languages Into Braille For Visually Challenged People. International Journal of Computer Engineering & Technology, 9(4), 2018, pp. 73–84. http://iaeme.com/Home/issue/IJCET?Volume=9&Issue=4 1. INTRODUCTION The Braille encoding system represents textual documents in a readable format for the visually challenged persons. As there is a shortage of Braille compatible reading materials, visually challenged people face trouble in necessities like education and employment. Reading text documents is difficult for visually challenged people in various circumstances. Visually impaired persons can read only by use of Braille script. The majority of printed works does not include Braille or speech versions. There is a need of a system for automatic recognition of text documents to Braille and speech to reduce communication gap between the written text systems http://iaeme.com/Home/journal/IJCET 73 [email protected] Dr. G.Gayathri Devi and G.Sathyanarayanan used by sighted persons and access mechanisms through which visually impaired people can communicate 2. BRAILLE SYSTEM Braille [1,2] is a tactile writing system used by visually challenged people. Braille is a system of raised dots that can be read with the fingers by people who have low vision or blind. Braille is named after Louis Braille, the French man who designed. Braille symbols are formed within units of space known as Braille cells. A complete Braille cell comprises of six raised dots arranged in two parallel rows each having three dots. The locations of dot are recognized by numbers from one through six and it is shown in the figure 1. Sixty-four combinations (2^6) are possible using one or more of these six dots. A single cell can be used to denote an alphabet letter, number, punctuation mark, or even an entire word. Fig 1 Braille Cell 2.1. Southern Indian Language The Dravidian languages are a language spoken mainly in Tamil Nadu, Kerala, Telangana, Andhra, Karnataka and as well as in Bangladesh , Bhutan , Mauritius, Sri Lanka, some parts of Pakistan, Burma, southern Afghanistan, Nepal, Malaysia, Africa, Indonesia and Singapore. The Dravidian languages with the most speakers are Tamil, Kannada , Telugu and Malayalam. Vowels and Consonants of Indian Languages Indian languages are phonetic in nature. All the languages have their own set of Vowels and Consonants. They have a good mapping from one language to the other with respect to the features. It is observed that the fundamental vowels and consonants for all the four languages are same besides some specialty with respect to the consonants for each of the languages and moreover the absence of some of the consonants in the typical Dravidian language Tamil. Consonants and vowels are combined to form Composite characters. The Tamil script has 12 vowels 18 consonants and one special character, the ayudha ezhuthu. The complete script, therefore, consists of the 31 letters in their independent form and an additional 216 composite letters, for a total of 247 combinations of a consonant and a vowel, a mute consonant, or a vowel alone. The Telugu script consists of 60 symbols – 16 vowels, 3 vowel modifiers, and 41 consonants. The Malayalam alphabet has 13 vowel letters, 36 consonant letters, and a few other symbols. Kannada language uses 49 phonemic letters and it is divided into 3-groups,15 Vowels , 34 Consonants and modifier glyphs (Half-letter) .There are 510 (34 * 15)composite characters. 2.2. Southern Indian Language Braille System Braille is used by many people all over the world in their mother tongue, and gives a means of knowledge for all. Bharati Braille [4] or Indian Braille is a largely unified Braille script for writing the Indian languages. Initially 11 Braille scripts were in usage in various areas of the http://iaeme.com/Home/journal/IJCET 74 [email protected] Conversion of Text Image Document In Southern Indian Languages Into Braille For Visually Challenged People India and for different languages. Bharati Braille had become a nationwide standard Braille script and it has been followed by Bangladesh., Sri Lanka, Nepal. Across the different Indian languages there are up to sixteen vowels and about forty consonants. Figure 2, Figure 3, Figure 4 and Figure 5 shows the Tamil, Telugu, Malayalam and Kannada Braille alphabets sheets respectively. The alphabets of Indian languages are divided into vowels and consonants. Across the languages, fifteen vowels and thirty three consonants are common (with the exception of Tamil) and hence the basic assignment in Bharati Braille is same. Bharati Braille assigns an individual cell to each vowel and a consonant. Fig 2 Tamil Braille Alphabets Sheet Fig 3 Telugu Braille Alphabets Sheet Fig 4 Malayalam Braille Alphabets Sheet Fig 5Kannada Braille Alphabets Sheet 3. PROPOSED METHODLOGY Gayathri et.al [4] discussed research work projected earlier for recognition the Southern Indian Braille script from a Braille document. This research proposes a successful conversion system from southern Indian Languages to Braille and it contains i) Image Pre-processing, ii) OCR iii) Unicode Conversion, iv) Grade 1 Braille script conversion The flow of proposed system is shown in fig 5. http://iaeme.com/Home/journal/IJCET 75 [email protected] Dr. G.Gayathri Devi and G.Sathyanarayanan Fig 5 Flow of the Work 3.1. Work Flow Mapping Table Creation - A database is created where it stores the alphabets of Southern Indian Languages and its corresponding Braille Unicode and its Braille equivalent symbol. Input - The system accepts scanned document image and the language as input Stage 1: Preprocessing - Preprocessing is necessary for efficient recovery of the text information from scanned image. A text extraction algorithm [5,6] was presented to preserve text area and remove non-text area. The method was based on gamma correction method and positional connected component labeling algorithm [7,8]. Stage 2: OCR Conversion – The language of scanned text document image is selected by the user. OCR System is a system that automates the process of getting and processing the images of text document into editable text file. The output of stage 1 is given to the corresponding free open OCR Engine for text conversion. Stage 3: Unicode Conversion system – The editable text is converted in to its expanded form and those expanded text is converted into its corresponding Unicode equivalent. Stage 4: Braille Conversion System – The expanded text Unicode is searched in look up table and its corresponding Braille Unicode is identified and it is converted to its Braille Symbol. 3.2. Algorithm The steps of the algorithm are as follows: Algorithm – Text Image to Braille Conversion Input: Scanned Text Image and the language of Scanned Image Output: Braille Equivalent 1. The Scanned Image is Preprocessed and converted into editable String SLangChar[] using appropriate OCR Engine. 2. Create a Mapping look-up database (Slang2SlangUni) where it stores Southern Indian language alphabet and its corresponding Unicode 3. Create a Mapping look-up database (Charuni2BrailleUni where it stores Southern Indian language Unicode and its corresponding Braille Unicode. 4. Create a Mapping look-up database (BrailleUni2BrailleSym) where it stores Braille Unicode and its corresponding Symbol. 5. Buni [] = BSym[]= SLanguni[]=SLangCharExtend[]=NULL 6. Read the text string SLangChar[] 7. For i =1 to len of SLangChar Is SLangChar[i] a Vowel or Consonant http://iaeme.com/Home/journal/IJCET 76 [email protected] Conversion of Text Image Document In Southern Indian Languages Into Braille For Visually Challenged People SLangCharExtend[]+= SLangChar[i] else Is SLangChar[i] a Syllable Break the corresponding Syllable in to Consonant[s] +Vowel and strore in temp[] SLangCharExtend[]+=temp End End for For i =1 to len of SLangCharExtend Find the Match for SLangCharExtend[i] in Mapping look-up database(Slang2SlangUni) and store in Tlang SLanguni[]+=Tlang End for 8. For i =1 to len of SLanguni Find the Match for SLanguni[i] in Mapping look-up database (Charuni2BrailleUni database) and store in Tuni Buni[]+=Tuni End for 9. For i =1 to len of Buni[] Find the Match for Buni[i] in Mapping look-up database (BrailleUni2BrailleSym ) and store in Tsym Bsym[]+=Tsym End for 10. Return BSym[]. Indian Language Braille follows the expanded format for representing the vowel-consonant combined character. For ex, the Braille notation for த and இ/◌ி are written together to represent the combined character தி. The normal text is converted in to expanded form using look up table. The expanded form is converted in to Unicode form.