OCR Pwds and Assistive Qatari Using OCR Issue No
Total Page:16
File Type:pdf, Size:1020Kb
Arabic Optical State of the Smart Character Art in Arabic Apps for Recognition OCR PWDs and Assistive Qatari using OCR Issue no. 15 Technology Research Nafath Efforts Page 04 Page 07 Page 27 Machine Learning, Deep Learning and OCR Revitalizing Technology Arabic Optical Character Recognition (OCR) Technology at Qatar National Library Overview of Arabic OCR and Related Applications www.mada.org.qa Nafath About AboutIssue 15 Content Mada Nafath3 Page Nafath aims to be a key information 04 Arabic Optical Character resource for disseminating the facts about Recognition and Assistive Mada Center is a private institution for public benefit, which latest trends and innovation in the field of Technology was founded in 2010 as an initiative that aims at promoting ICT Accessibility. It is published in English digital inclusion and building a technology-based community and Arabic languages on a quarterly basis 07 State of the Art in Arabic OCR that meets the needs of persons with functional limitations and intends to be a window of information Qatari Research Efforts (PFLs) – persons with disabilities (PWDs) and the elderly in to the world, highlighting the pioneering Qatar. Mada today is the world’s Center of Excellence in digital work done in our field to meet the growing access in Arabic. Overview of Arabic demands of ICT Accessibility and Assistive 11 OCR and Related Through strategic partnerships, the center works to Technology products and services in Qatar Applications enable the education, culture and community sectors and the Arab region. through ICT to achieve an inclusive community and educational system. The Center achieves its goals 14 Examples of Optical by building partners’ capabilities and supporting the Character Recognition Tools development and accreditation of digital platforms in accordance with international standards of digital access. Arabic Optical Mada raises awareness, provides consulting services 16 and increases the number of assistive technology Character Recognition solutions in Arabic through the Mada Innovation Program (OCR) Technology at to enable equal opportunities for PWDs and the elderly Qatar National Library in the digital community. 22 The optical recognition technology At the national level, Mada Center has achieved a digital in assistive technology devices accessibility rate of 94% amongst government websites, while Qatar ranks fifth globally on the Digital Accessibility Rights Evaluation Index (DARE). Machine Learning, 24 Deep Learning and OCR Our Vision Enhancing ICT accessibility in Qatar and beyond. Revitalizing Technology Our Mission 27 Smart Apps for PWDs using OCR Unlock the potential of persons with functional limitations (PFLs) – persons with disabilities (PWDs) and the elderly - 30 Making Social Media Accessible for All through enabling ICT accessible capabilities and platforms. Twitter Nafath Nafath Arabic Optical Character Recognition and Assistive Technology Issue 15 Issue 15 4 5 The advent of OCR based AT has had a materials to accessible format using specialized transforming effect on PWDs in terms of areas software or hardware solutions. The first step Arabic Optical Character like increased educational productivity and in the conversion process relies on using OCR enhanced independence in performing daily technologies to identify the characters in the tasks, leading to an improved quality of life. reading materials and convert the content into The ability of OCR technology to offer efficient a digital format. Once converted, the digitized Recognition and and accurate conversion of paper and image document can be used by AT solutions (e.g., documents to editable digitized formats has Duxbury, EZ converter, etc.) to create reading greatly influenced the potential of providing materials in an accessible format like large Assistive Technology information in accessible formats suitable print, text-to-speech, and braille. Without for use by PWDs. Several types of innovative accurate OCR technology, the translation of As Optical Character Recognition (OCR) technology has gone AT solutions based on OCR technologies are scanned documents into digital format would be through substantial improvements over the past decades, the available in the market today, some of which unreliable and thus, hinder the ability for those Assistive Technology (AT) industry has utilized it as a crucial tool to are as follows: with visual impairments to read independently. serve as a foundation for developing breakthroughs in innovative Technologies that help to Wearable technologies that enable Individuals AT solutions. AT for individuals with various kinds of disabilities overcome Learning Difficulties with Visual Impairment to identify key objects has been developed through the use of OCR. These solutions have Individuals with learning difficulties like and improve their ability for independent living enabled Persons with Disabilities (PWDs) to be active members of dyslexia, and Attention Deficit Hyperactivity In recent years the concept of integrating society in several domains such as education, employment, and Disorder (ADHD) often find it challenging to read accessibility features into smart glasses to community. printed materials as it becomes problematic to improve the lives of individuals with visual distinguish between characters and keep track of impairment has been explored. The inclusion the flow of reading. Various AT solutions support of a camera and computing chip in smart the creation of digital documents from scanned glasses allows them to be an ideal platform for documents and images through the use of OCR learning OCR based AT solutions. Smart glass technology. These digitized documents can be technologies like Envision, NuEyes, and OCRam processed automatically and converted into have integrated OCR based features that enable accessible materials tailored fit for the needs the identification of print-based information like of the student with learning disabilities. Once product expiry dates, restaurant menus, and converted to digital format, specialized AT (e.g., printed bills. These glasses are also equipped TextHelp, Clicker, Kurzweil, etc.) can support with audio output which allows text-to-speech features like text-to-speech and text-tracking feedback of OCR identified information. tools that help users visually track the words being read out to them by the software. The Technologies that enable Individuals with software features can also include additional Physical Disabilities to access print materials useful options like creating a user-configurable Accessing conventional reading materials like library, dictionary, highlighting, adjustable word books and newspapers can be challenging and sentence tracking colors, and customizable for Individuals with Physical Disabilities as it backgrounds. requires sufficient dexterity to perform tasks like flipping the document pages. In such cases, the Technologies that enable Individuals with preferred access method is to have the reading Visual Impairment to read independently materials available in an electronic format (e.g., Individuals with visual impairments like HTML, PDF, etc.). OCR based AT solutions like low vision or blindness can often encounter OpenBook allow the creation of documents challenges to read printed materials. AT solutions in an electronic format from scanned printed can automate the process of converting printed documents and graphics-image based text. Nafath Arabic Optical Character Recognition and Assistive Technology Nafath Issue 15 Issue 15 6 7 The development of OCR technologies in Mada Center has continuously been invested other languages like Arabic has progressed in supporting the development of AT based on considerably over the past decade. This has Arabic OCR technologies. This continues to be State of the Art allowed the AT industry to create innovative achieved primarily by supporting innovators OCR based solutions localized in the Arabic and entrepreneurs through the Mada Innovation language as the improved OCR accuracy meant Program. A major contribution of Mada Center a more reliable outcome from the AT solution. towards its commitment of supporting the in Arabic OCR Currently, there are a handful of OCR based evolution of Arabic OCR is the development of the solutions commercially available in the Arabic “Arabic Money Reader App”, which recognizes language, some of which have around 96% or Qatari Riyal currency notes using the mobile Qatari Research Efforts more accuracy. Examples of OCR solutions that phone camera. Mada Center has also been support the Arabic language include: committed to supporting the digitization of Arabic Since the mid-1940s, there has been extensive research and language reading materials in accessible format publications on character recognition. With most of the published Sakhr OCR available worldwide. This objective is achieved work being on Latin characters, and Japanese and Chinese Sakhr OCR solution is capable of identifying by collaborating with international partners characters emerging in the mid-1960s. Despite almost a billion complex fonts (including cursive writing), like Bookshare, which hosts one of the largest diacritics, position-dependent character shapes, platforms of accessible reading materials for people worldwide using Arabic characters for writing (Arabic, overlapping, and non-standard fonts in the individuals with print disabilities. Persian, and Urdu), Arabic character recognition research, starting Arabic language. Sakhr OCR converts scans of in