MADA NAFATH MAGAZINE LAYOUT ISSUE 15.1.Indd

Total Page:16

File Type:pdf, Size:1020Kb

MADA NAFATH MAGAZINE LAYOUT ISSUE 15.1.Indd Arabic Optical State of the Smart Character Art in Arabic Apps for Recognition OCR PWDs and Assistive Qatari using OCR Issue no. 15 Technology Research Nafath Efforts Page 04 Page 07 Page 27 Machine Learning, Deep Learning and OCR Revitalizing Technology Arabic Optical Character Recognition (OCR) Technology at Qatar National Library Overview of Arabic OCR and Related Applications www.mada.org.qa Nafath About AboutIssue 15 Content Mada Nafath3 Page Nafath aims to be a key information 04 Arabic Optical Character resource for disseminating the facts about Recognition and Assistive Mada Center is a private institution for public benefit, which latest trends and innovation in the field of Technology was founded in 2010 as an initiative that aims at promoting ICT Accessibility. It is published in English digital inclusion and building a technology-based community and Arabic languages on a quarterly basis 07 State of the Art in Arabic OCR that meets the needs of persons with functional limitations and intends to be a window of information Qatari Research Efforts (PFLs) – persons with disabilities (PWDs) and the elderly in to the world, highlighting the pioneering Qatar. Mada today is the world’s Center of Excellence in digital work done in our field to meet the growing access in Arabic. Overview of Arabic demands of ICT Accessibility and Assistive 11 OCR and Related Through strategic partnerships, the center works to Technology products and services in Qatar Applications enable the education, culture and community sectors and the Arab region. through ICT to achieve an inclusive community and educational system. The Center achieves its goals 14 Examples of Optical by building partners’ capabilities and supporting the Character Recognition Tools development and accreditation of digital platforms in accordance with international standards of digital access. Arabic Optical Mada raises awareness, provides consulting services 16 and increases the number of assistive technology Character Recognition solutions in Arabic through the Mada Innovation Program (OCR) Technology at to enable equal opportunities for PWDs and the elderly Qatar National Library in the digital community. 22 Optical Character Recognition (OCR) At the national level, Mada Center has achieved a digital in Assistive Technologies accessibility rate of 94% amongst government websites, while Qatar ranks fifth globally on the Digital Accessibility Rights Evaluation Index (DARE). Machine Learning, 24 Deep Learning and OCR Our Vision Enhancing ICT accessibility in Qatar and beyond. Revitalizing Technology Our Mission 27 Smart Apps for PWDs using OCR Unlock the potential of persons with functional limitations (PFLs) – persons with disabilities (PWDs) and the elderly - 30 Making Social Media Accessible for All through enabling ICT accessible capabilities and platforms. Twitter Nafath Nafath Arabic Optical Character Recognition and Assistive Technology Issue 15 Issue 15 4 5 The advent of OCR based AT has had a materials to accessible format using specialized transforming effect on PWDs in terms of areas software or hardware solutions. The first step Arabic Optical Character like increased educational productivity and in the conversion process relies on using OCR enhanced independence in performing daily technologies to identify the characters in the tasks, leading to an improved quality of life. reading materials and convert the content into The ability of OCR technology to offer efficient a digital format. Once converted, the digitized Recognition and and accurate conversion of paper and image document can be used by AT solutions (e.g., documents to editable digitized formats has Duxbury, EZ converter, etc.) to create reading greatly influenced the potential of providing materials in an accessible format like large Assistive Technology information in accessible formats suitable print, text-to-speech, and braille. Without for use by PWDs. Several types of innovative accurate OCR technology, the translation of As Optical Character Recognition (OCR) technology has gone AT solutions based on OCR technologies are scanned documents into digital format would be through substantial improvements over the past decades, the available in the market today, some of which unreliable and thus, hinder the ability for those Assistive Technology (AT) industry has utilized it as a crucial tool to are as follows: with visual impairments to read independently. serve as a foundation for developing breakthroughs in innovative Technologies that help to Wearable technologies that enable Individuals AT solutions. AT for individuals with various kinds of disabilities overcome Learning Difficulties with Visual Impairment to identify key objects has been developed through the use of OCR. These solutions have Individuals with learning difficulties like and improve their ability for independent living enabled Persons with Disabilities (PWDs) to be active members of dyslexia, and Attention Deficit Hyperactivity In recent years the concept of integrating society in several domains such as education, employment, and Disorder (ADHD) often find it challenging to read accessibility features into smart glasses to community. printed materials as it becomes problematic to improve the lives of individuals with visual distinguish between characters and keep track of impairment has been explored. The inclusion the flow of reading. Various AT solutions support of a camera and computing chip in smart the creation of digital documents from scanned glasses allows them to be an ideal platform for documents and images through the use of OCR learning OCR based AT solutions. Smart glass technology. These digitized documents can be technologies like Envision, NuEyes, and OCRam processed automatically and converted into have integrated OCR based features that enable accessible materials tailored fit for the needs the identification of print-based information like of the student with learning disabilities. Once product expiry dates, restaurant menus, and converted to digital format, specialized AT (e.g., printed bills. These glasses are also equipped TextHelp, Clicker, Kurzweil, etc.) can support with audio output which allows text-to-speech features like text-to-speech and text-tracking feedback of OCR identified information. tools that help users visually track the words being read out to them by the software. The Technologies that enable Individuals with software features can also include additional Physical Disabilities to access print materials useful options like creating a user-configurable Accessing conventional reading materials like library, dictionary, highlighting, adjustable word books and newspapers can be challenging and sentence tracking colors, and customizable for Individuals with Physical Disabilities as it backgrounds. requires sufficient dexterity to perform tasks like flipping the document pages. In such cases, the Technologies that enable Individuals with preferred access method is to have the reading Visual Impairment to read independently materials available in an electronic format (e.g., Individuals with visual impairments like HTML, PDF, etc.). OCR based AT solutions like low vision or blindness can often encounter OpenBook allow the creation of documents challenges to read printed materials. AT solutions in an electronic format from scanned printed can automate the process of converting printed documents and graphics-image based text. Nafath Arabic Optical Character Recognition and Assistive Technology Nafath Issue 15 Issue 15 6 7 The development of OCR technologies in Mada Center has continuously been invested other languages like Arabic has progressed in supporting the development of AT based on considerably over the past decade. This has Arabic OCR technologies. This continues to be State of the Art allowed the AT industry to create innovative achieved primarily by supporting innovators OCR based solutions localized in the Arabic and entrepreneurs through the Mada Innovation language as the improved OCR accuracy meant Program. A major contribution of Mada Center a more reliable outcome from the AT solution. towards its commitment of supporting the in Arabic OCR Currently, there are a handful of OCR based evolution of Arabic OCR is the development of the solutions commercially available in the Arabic “Arabic Money Reader App”, which recognizes language, some of which have around 96% or Qatari Riyal currency notes using the mobile Qatari Research Efforts more accuracy. Examples of OCR solutions that phone camera. Mada Center has also been support the Arabic language include: committed to supporting the digitization of Arabic Since the mid-1940s, there has been extensive research and language reading materials in accessible format publications on character recognition. With most of the published Sakhr OCR available worldwide. This objective is achieved work being on Latin characters, and Japanese and Chinese Sakhr OCR solution is capable of identifying by collaborating with international partners characters emerging in the mid-1960s. Despite almost a billion complex fonts (including cursive writing), like Bookshare, which hosts one of the largest diacritics, position-dependent character shapes, platforms of accessible reading materials for people worldwide using Arabic characters for writing (Arabic, overlapping, and non-standard fonts in the individuals with print disabilities. Persian, and Urdu), Arabic character recognition research, starting Arabic language. Sakhr OCR converts scans of in the
Recommended publications
  • DCSA API Design Principles 1.0
    DCSA API Design Principles 1.0 September 2020 Table of contents Change history ___________________________________________________ 4 Terms, acronyms, and abbreviations _____________________________________ 4 1 Introduction __________________________________________________ 5 1.1 Purpose and scope ________________________________________________ 5 1.2 API characteristics ________________________________________________ 5 1.3 Conventions ____________________________________________________ 6 2 Suitable _____________________________________________________ 6 3 Maintainable __________________________________________________ 6 3.1 JSON _________________________________________________________ 6 3.2 URLs __________________________________________________________ 6 3.3 Collections _____________________________________________________ 6 3.4 Sorting ________________________________________________________ 6 3.5 Pagination______________________________________________________ 7 3.6 Property names __________________________________________________ 8 3.7 Enum values ____________________________________________________ 9 3.8 Arrays _________________________________________________________ 9 3.9 Date and Time properties ___________________________________________ 9 3.10UTF-8 _________________________________________________________ 9 3.11 Query parameters _______________________________________________ 10 3.12 Custom headers ________________________________________________ 10 3.13 Binary data _____________________________________________________ 11
    [Show full text]
  • OCR Pwds and Assistive Qatari Using OCR Issue No
    Arabic Optical State of the Smart Character Art in Arabic Apps for Recognition OCR PWDs and Assistive Qatari using OCR Issue no. 15 Technology Research Nafath Efforts Page 04 Page 07 Page 27 Machine Learning, Deep Learning and OCR Revitalizing Technology Arabic Optical Character Recognition (OCR) Technology at Qatar National Library Overview of Arabic OCR and Related Applications www.mada.org.qa Nafath About AboutIssue 15 Content Mada Nafath3 Page Nafath aims to be a key information 04 Arabic Optical Character resource for disseminating the facts about Recognition and Assistive Mada Center is a private institution for public benefit, which latest trends and innovation in the field of Technology was founded in 2010 as an initiative that aims at promoting ICT Accessibility. It is published in English digital inclusion and building a technology-based community and Arabic languages on a quarterly basis 07 State of the Art in Arabic OCR that meets the needs of persons with functional limitations and intends to be a window of information Qatari Research Efforts (PFLs) – persons with disabilities (PWDs) and the elderly in to the world, highlighting the pioneering Qatar. Mada today is the world’s Center of Excellence in digital work done in our field to meet the growing access in Arabic. Overview of Arabic demands of ICT Accessibility and Assistive 11 OCR and Related Through strategic partnerships, the center works to Technology products and services in Qatar Applications enable the education, culture and community sectors and the Arab region. through ICT to achieve an inclusive community and educational system. The Center achieves its goals 14 Examples of Optical by building partners’ capabilities and supporting the Character Recognition Tools development and accreditation of digital platforms in accordance with international standards of digital access.
    [Show full text]
  • The IHO S-100 Standard and E-Navigation Information
    e-NAV10/INF/7 e-NAV10 Information paper Agenda item 12 Task Number Author(s) Raphael Malyankar, Jeppesen Jarle Hauge, Norwegian Coastal Administration The IHO S-100 Standard and e-Navigation Information Concept Exploration with Ship Reporting Data and Product Specification 1 SUMMARY The papers describe an exploration in modeling substantially non-geographic maritime information using the S-100 framework, specifically notice of arrival and pilot requests in Norway. The Norwegian Coastal Administration is the National Competent Authority for the European SafeSeaNet (SSN) in Norway and thereby maintains a vessel and voyage reporting system intended for use by commercial marine traffic arriving and departing Norwegian ports. Data used in this system describes vessels, HAZMAT cargo, voyages, and information used in arranging pilotage. Jeppesen and the NCA has developed a product specification (the “NOAPR product specification”) based on the S-100 standard, for a subset of information used in the abovementioned system. The product specification describes the data model for ship reporting and pilot requests. The current version is a “proof-of-concept” intended to explore the development of S-100 compatible data models for non-geographic maritime information. The papers also discuss the use of the Geospatial Information Registry and the NOAPR Model. 1.1 Purpose of the document The product specification [NOAPR] demonstrates the feasibility of modelling ship notice of arrival and pilot requests using the data model compatible with S-100. 2 BACKGROUND The papers are a result of a mutual work between Jeppesen and the Norwegian Coastal Administration within the Interreg project; BLAST (http://www.blast-project.eu/index.php).
    [Show full text]
  • AP Style Style Versus Rules
    Mignon Fogarty AP Style Style Versus Rules AP Style Why the AP Stylebook? AP Style What are the key differences in AP style? AP Style AP Style — Does not use italics. AP Style No Italics This means most titles go in quotation marks. AP Style Quotation Marks “Atomic Habits” (book) “Candy Crush”(computer game) “Jumanji” (movie) + opera titles, poem titles, album titles, song titles, TV show titles, and more AP Style No Quotation Marks — Holy Books (the Bible, the Quran) — Reference Books (Webster’s New World College Dictionary, Garner’s Modern English Usage) AP Style No Quotation Marks — Newspaper and Magazine Names (The Washington Post, Reader’s Digest) — Website and App Names (Yelp, Facebook) — Board Games (Risk, Settlers of Catan) AP Style Quotation Marks — Billy waited 30 seconds for “Magic the Gathering” to launch on his iPad. — The boys meet every Saturday at the game store to play Magic the Gathering. AP Style AP Style — Does not use italics. — Doesn’t always use the serial comma. AP Style Serial Comma red, white, and blue AP Style AP Style red, white and blue AP Style Serial Comma Do use it when series elements contain conjunctions. AP Style Serial Comma — Peanut butter and jelly — Ham and eggs — Macaroni and cheese AP Style AP Style I like peanut butter and jelly, ham and eggs, and macaroni and cheese. AP Style AP Style I like peanut butter and jelly, ham, and cheese. AP Style Serial Comma Do use it when series elements contain complex phrases. AP Style AP Style Squiggly wondered whether Aardvark had caught any fish, whether Aardvark would be home for dinner, and whether Aardvark would be in a good mood.
    [Show full text]
  • Abbreviation with Capital Letters
    Abbreviation With Capital Letters orSometimes relativize beneficentinconsequentially. Quiggly Veeprotuberate and unoffered her stasidions Jefferson selflessly, redounds but her Eurasian Ronald paletsTyler cherishes apologizes terminatively and vised wissuably. aguishly. Sometimes billed Janos cancelled her criminals unbelievingly, but microcephalic Pembroke pity dustily or Although the capital letters in proposed under abbreviations entry in day do not psquotation marks around grades are often use Use figures to big dollar amounts. It is acceptable to secure the acronym CPS in subsequent references. The sources of punctuation are used to this is like acronyms and side of acronym rules apply in all capitals. Two words, no bag, no hyphen. Capitalize the months in all uses. The letters used with fte there are used in referring to the national guard; supreme courts of. As another noun or recognize: one are, no hyphen, not capitalized. Capitalize as be would land the front porch an envelope. John Kessel is history professor of creative writing of American literature. It introduces inconsistencies, no matter how you nurture it. Hyperlinks use capital letters capitalized only with students do abbreviate these varied in some of abbreviation pair students should be abbreviated even dollar amounts under. Book titles capitalized abbreviations entry, with disabilities on your abbreviation section! Word with a letter: honors colleges use an en dash is speaking was a name. It appeared to be become huge success. Consider providing a full explanation each time. In the air national guard, such as well as individual. Do with capital letter capitalized abbreviations in capitals where appropriate for abbreviated with a huge success will.
    [Show full text]
  • Time: Fifteen Minutes Goal: Use the Character Panel in Adobe Illustrator
    Activity 9: Type Hierarchy and Business Cards Time: Fifteen minutes Goal: Use the Character Panel in Adobe Illustrator and what you’ve learned about type to create a visual hierarchy and customize a personal business card. Activity: 1. Download Activity9_TypeHierarchy.ai from Canvas and open. 2. Explore the Character Panel. Change case here (CAPS, Camel case, etc.) Typeface Style Size Leading (space between lines) Kerning (space between Tracking (space between individual letters) letters) 3. Pick a typeface. 4. Fill in the business card with your information or fake information. You must include 5 pieces of information. 5. Style the text to create a clear visual hierarchy using using Typefaces, Size, Styling, Case, Tracking, and Leading. Visual Hierarchy: The visual order of elements on your business card or map. Big, bold elements draw your eye and rise to figure. Small, thin elements detract and are pushed into the background. Visual hierarchies layer elements visually and move your eye around a given page. Questions, Tips, and Tricks: How can you draw the eye using type (typeface, size, style, case, tracking, leading)? What do you want to stand out and what should be pushed further into the background? Make a list of elements on your business card and rank them from most important to least important. Style them accordingly. Submit: Export a (.png) of your business card and submit to Canvas. To Export, File > Export > Export As > Make sure Use Artboards is checked > Save. .
    [Show full text]
  • DTMB – MDE State of Michigan Coding Standards and Guidelines
    DTMB – MDE State of Michigan Coding Standards and Guidelines Revision History Date Version Description Author 01/04/2013 1.0 Initial Version Tammy Droscha 2/08/2013 1.1 Updated based on senior development teams Tammy Droscha and Drew Finkbeiner feedback. 12/07/2016 1.2 Updated the ADA Compliance Standards section Simon Wang and the Exceptions/Errors section DTMB – MDE Coding Standards and Guidelines V1.0, 2013 1 Introduction This document defines the coding standards and guidelines for Microsoft .NET development. This includes Visual Basic, C#, and SQL. These standards are based upon the MSDN Design Guidelines for .NET Framework 4. Naming Guidelines This section provides naming guidelines for the different types of identifiers. Casing Styles and Capitalization Rules 1. Pascal Casing – the first letter in the identifier and the first letter of each subsequent concatenated word are capitalized. This case can be used for identifiers of three or more characters. E.G., PascalCase 2. Camel Casing – the first letter of an identifier is lowercase and the first letter of each subsequent concatenated word is capitalized. E.G., camelCase 3. When an identifier consists of multiple words, do not use separators, such as underscores (“_”) or hyphens (“-“), between words. Instead, use casing to indicate the beginning of each word. 4. Use Pascal casing for all public member, type, and namespace names consisting of multiple words. (Note: this rule does not apply to instance fields.) 5. Use camel casing for parameter names. 6. The following table summarizes
    [Show full text]
  • Package 'Snakecase'
    Package ‘snakecase’ May 26, 2019 Version 0.11.0 Date 2019-05-25 Title Convert Strings into any Case Description A consistent, flexible and easy to use tool to parse and con- vert strings into cases like snake or camel among others. Maintainer Malte Grosser <[email protected]> Depends R (>= 3.2) Imports stringr, stringi Suggests testthat, covr, tibble, purrrlyr, knitr, rmarkdown, magrittr URL https://github.com/Tazinho/snakecase BugReports https://github.com/Tazinho/snakecase/issues Encoding UTF-8 License GPL-3 RoxygenNote 6.1.1 VignetteBuilder knitr NeedsCompilation no Author Malte Grosser [aut, cre] Repository CRAN Date/Publication 2019-05-25 22:50:03 UTC R topics documented: abbreviation_internal . .2 caseconverter . .2 check_design_rule . .6 parsing_helpers . .7 preprocess_internal . .8 relevant . .9 replace_special_characters_internal . .9 to_any_case . 10 to_parsed_case_internal . 14 1 2 caseconverter Index 16 abbreviation_internal Internal abbreviation marker, marks abbreviations with an underscore behind. Useful if parsing_option 1 is needed, but some abbrevia- tions need parsing_option 2. Description Internal abbreviation marker, marks abbreviations with an underscore behind. Useful if parsing_option 1 is needed, but some abbreviations need parsing_option 2. Usage abbreviation_internal(string, abbreviations = NULL) Arguments string A string (for example names of a data frame). abbreviations character with (uppercase) abbreviations. This marks abbreviations with an un- derscore behind (in front of the parsing). Useful if parsing_option
    [Show full text]
  • Course Material Filename Conventions
    Course Material Filename Conventions These instructions are intended for academic staff creating new Course Material documents. Filename Conventions are to be used across all subjects in The Source, and for all Course Materials linked to the LMS via The Source. The objective is to standardise the structure of filenames across all materials and all subjects. This will help students after downloading subject documents, so they will see the files sorted by subject code and week. The Source uses a platform called SharePoint, which automatically applies version control so documents do not require the year in the filename. WHAT ARE NAMING CONVENTIONS? 'File names' are the names that are listed in the file directory and that users give to new files when they save them for the first time. The conventions assume that a logical directory structure or filing scheme is in place and that similar conventions are used for naming the levels and folders within the directory structure. This document is intended to provide a common set of rules to apply to the naming of electronic files. The ‘naming conventions’ are primarily intended for use with Windows based software and documents such as word-processed documents, spreadsheets and slide-show presentations. WHY USE THEM? Naming records consistently, logically and in a predictable way will distinguish similar records from one another at a glance, and by doing so will facilitate the storage and retrieval of records, which will enable users to browse file names more effectively and efficiently. Naming records according to agreed naming conventions should also make file naming process easier for staff because they will not have to ‘re-think’ the process each time.
    [Show full text]
  • CP2032 Casing of Bulkdata Uri Element Name
    DICOM Correction Proposal Status Final Text Date of Last Update 2021/03/25 Person Assigned Steve Nichols ([email protected]) Submitter Name Jouke Numan ([email protected]) Submission Date 2020/02/25 Correction Number CP-2032 Log Summary: Casing of Bulkdata uri element name inconsistent for DICOM XML Name of Standard PS3.18, PS3.19 2020a Rationale for Correction: PS3.19 Table A.1.5-2 specifies lowercase (‘uri’) as the element name for BulkData element child to contain HTTP(S) URI whereas the normative schema in section A.1.6 specifies uppercase (‘URI’). Similar in PS3.18: In F.3.1 text, the lower case ‘uri’ is used to refer to the DICOM XML element but in Table F.3.1-1 containing the example mapping the uppercase (‘URI’) is used. Assuming that the schema is normative, lowercase usage should be changed to uppercase. Discussion: WG-27 22 June 2020 • Intent is for schema to be lowercase, attribute names uid and uuid are lowercase in PS3.19 A.1.5-2 • Bill Wallace commented that uid and uuid resolve to lowercase • Changing UID in A.1.5-2 to uppercase would be a breaking change • WG-27 agreed that upper/lower case conventions should be clarified in a note Correction Wording: Add note to PS3.19 section A.1.1 as follows A.1.1 Usage The Native DICOM Model defines a representation of binary-encoded DICOM SOP Instances as XML Infosets that allows a recipient of data to navigate through a binary DICOM data set using XML-based tools instead of relying on tool kits that understand the binary encoding of DICOM.
    [Show full text]
  • To Camelcase Or Under Score
    To CamelCase or Under score Dave Binkley† Marcia Davis‡ Dawn Lawrie† Christopher Morrell† †Loyola College ‡Johns Hopkins University BaltimoreMD BaltimoreMD 21210,USA 21218,USA [email protected] [email protected] [email protected] [email protected] Abstract A recent trend in style guides for identifiers is to favor Naming conventions are generally adopted in an effort camel casing (e.g., spongeBob) over the use of underscores to improve program comprehension. Two of the most pop- (e.g., sponge bob). However, natural language research in ular conventions are alternatives for composing multi-word psychology suggests that this is the wrong choice. For ex- identifiers: the use of underscores and the use of camel cas- ample, a study by Epelboim et al. [6] considered the ef- ing. While most programmers have a personal opinion as fect of the type of space filler on word recognition. They to which style is better, empirical study forms a more appro- found that replacing spaces with Latin letters, Greek letters, priate basis for choosing between them. or digits had a negative impact on reading. However, shaded The central hypothesis considered herein is that identi- boxes have essentially no effect on reading times or on the fier style affects the speed and accuracy of manipulating recognition of individual words. A shaded box depicts a programs. An empirical study of 135 programmers and space in a similar way to an underscore. In informal dis- non-programmers was conducted to better understand the cussions, psychology researchers assert that camel casing impact of identifier style on code readability.
    [Show full text]
  • Choosing Character Recognition Software To
    CHOOSING CHARACTER RECOGNITION SOFTWARE TO SPEED UP INPUT OF PERSONIFIED DATA ON CONTRIBUTIONS TO THE PENSION FUND OF UKRAINE Prepared by USAID/PADCO under Social Sector Restructuring Project Kyiv 1999 <CHARACTERRECOGNITION_E_ZH.DOC> printed on June 25, 2002 2 CONTENTS LIST OF ACRONYMS.......................................................................................................................................................................... 3 INTRODUCTION................................................................................................................................................................................ 4 1. TYPES OF INFORMATION SYSTEMS....................................................................................................................................... 4 2. ANALYSIS OF EXISTING SYSTEMS FOR AUTOMATED TEXT RECOGNITION................................................................... 5 2.1. Classification of automated text recognition systems .............................................................................................. 5 3. ATRS BASIC CHARACTERISTICS............................................................................................................................................ 6 3.1. CuneiForm....................................................................................................................................................................... 6 3.1.1. Some information on Cognitive Technologies ..................................................................................................
    [Show full text]