Generic Text Recognition Using Long Short-Term Memory Networks

Total Page:16

File Type:pdf, Size:1020Kb

Generic Text Recognition Using Long Short-Term Memory Networks Generic Text Recognition using Long Short-Term Memory Networks by Adnan Ul-Hasan Thesis approved by the Department of Computer Science University of Kaiserslautern for the award of Doctoral Degree Doctor of Engineering (Dr. -Ing) Date of PhD Defense: 11.01.2016 Dean of the Department: Prof. Dr. Klaus Schneider Chairperson of the PhD Committee: Prof. Dr. Paul Lukowicz Thesis Reviewers: Prof. Dr. Andreas Dengel, DFKI Kaiserslautern Associate Prof. Dr. Faisal Shafait, SEECS, NUST Pakistan apl. Prof. Dr. Marcus Liwicki, University of Kaiserslautern D 386 It always seems impossible until it is done. Nelson Mandela Abstract The task of printed Optical Character Recognition (OCR) is considered a “solved” issue by many Pattern Recognition (PR) researchers. The notion, however, partially true, does not represent the whole picture. Although, it is true that state-of-the-art OCR systems for many scripts exist, for example, for Latin, Greek, Han (Chinese), and Kana (Japanese), there is still a need for exhaustive research for many other challeng- ing modern scripts. Example of such scripts are: cursive Nabataean, which include Arabic, Persian, and Urdu; and the Brahamic family of scripts, which contain Devana- gari, Sanskrit, and its derivatives. These scripts present many challenging issues for OCR, for example, change in shape of character within a word depending upon its lo- cation, kerning, and a huge number of ligatures. Moreover, OCR research for histori- cal documents still requires much probing; therefore, efforts are required to develop robust OCR systems to preserve the literary heritage. Likewise, there is a need to address the issue of OCR of multilingual documents. Plenty of multilingual documents exist in the current time of globalization, which has increased the influence of different languages on each other. There is an increasein the usage of foreign words and phrases in articles, newspapers, and books, which are generating a large body of multilingual literature everyday. Another effect is seen in the products we use in our daily lives. From packaging of imported food items to so- phisticated electronics, the demand of international customers to access information about these products in their native language is ever increasing. The use of multilin- gual operational manuals, books, and dictionaries motivates the need to have multi- lingual OCR systems for their digitization. The aim of this thesis is to find the answers to some of these challenges using the contemporary machine learning methodologies, especially the Recurrent Neural Networks (RNN). Specifically, a recent architecture of these networks, referred toas Long Short-Term Memory (LSTM) networks, has been employed to OCR modern as well historical documents. The excellent OCR results obtained on these documents encourage us to extend their application to the field of multilingual OCR. The LSTM networks are first evaluated on standard English datasets to benchmark their performance. They yield better recognition results than any other contempo- rary OCR techniques without using sophisticated features and language modeling. Therefore, their application is further extended to more complex scripts that include Urdu Nastaleeq and Devanagari. For Urdu Nastaleeq script, LSTM networks achieve the best reported OCR results (2.55% Character Error Rate (CER)) on a publicly avail- able data set, while for Devanagari script, a new freely available database has been introduced on which CER of 9% is achieved. The LSTM-based methodology is further extended to the OCR of historical docu- ments. In this regard, this thesis focuses on Old German Fraktur script, medieval Latin script of the 15th century, and the Polytonic Greek script. LSTM-based systems out- perform the contemporary OCR systems on all of these scripts. For old documents, it is usually very hard to prepare transcribed dataset for training a neural network in su- pervised learning paradigm. A novel methodology has been proposed by combining segmentation-based and segmentation-free approaches to OCR scripts for which no transcribed training data is available. For German Fraktur and Polytonic Greek scripts, artificially generated data from existing text corpora yield highly promising results (CER of <1% and 5% for Fraktur and Polytonic Greek scripts respectively). Another major contribution of this thesis is an efficient Multilingual OCR (MOCR) system, which has been achieved in two ways. Firstly, a sequence-learning based script identification system has been developed that works at the text-line level; thereby, eliminating the need to segment individual characters or words prior to actual script identification. And secondly, a unified approach for dealing with different typesof multilingual documents has been proposed. The core motivation behind this gen- eralized framework is the reading ability of the human mind to process multilingual documents, where no script identification takes place. In this design, the LSTM net- works recognize multiple scripts simultaneously without the need to identify differ- ent scripts. The first step in building this framework is the realization of a language independent OCR system that recognizes multilingual text in a single step, using a sin- gle Long Short-Term Memory (LSTM) model. The language independent approach is then extended to a script independent OCR framework that can recognize multiscript documents using a single OCR model. The proposed generalized approach yields low error rate (1.2%) on a database of English-Greek bilingual documents. In summary, this thesis aims to extend the OCR research, from modern Latin scripts to old Latin, to Greek and to other “underprivileged” scripts such as Devanagari and Urdu Nastaleeq. It also provides a generalized OCR framework in dealing with multi- lingual documents. Contents Abstract v List of Figures xi List of Tables xv Abbreviations xvii 1 Introduction 1 1.1 Motivation ..................................... 4 1.2 Research Hypothesis ............................... 6 1.3 Contributions of the Thesis ........................... 6 1.4 Thesis Structure .................................. 7 2 Research Methodologies for Printed OCR 11 2.1 Segmentation-Based OCR ............................ 12 2.1.1 Template Matching for OCR ...................... 12 2.1.2 Over Segmentation Methods ...................... 14 2.2 Segmentation-Free OCR ............................. 14 2.2.1 HMM-Based OCR Approach ....................... 16 2.2.2 Sequence Learning Approach ..................... 16 P art − I : Challenges for P rinted OCR 3 Challenges for Robust OCR 19 3.1 Modern English .................................. 20 3.1.1 Challenges for OCR in Printed English ................ 20 3.1.2 OCR Databases for modern English .................. 21 3.2 Latin Script of Late Medieval Ages ....................... 22 3.2.1 Challenges for Old Latin OCR ..................... 22 3.2.2 OCR Databases for Old Latin ...................... 23 3.3 German Fraktur Script .............................. 23 3.3.1 Challenges for Fraktur OCR ....................... 24 3.3.2 OCR Databases for Fraktur ....................... 24 3.4 Polytonic Greek Script .............................. 25 3.4.1 OCR Challenges for Polytonic Greek ................. 25 3.4.2 OCR Databases for Polytonic Greek .................. 25 vii Contents viii 3.5 Devanagari Script ................................. 26 3.5.1 Challenges for Devanagari OCR .................... 27 3.5.2 OCR Databases for Devanagari .................... 27 3.6 Urdu Nastaleeq Script .............................. 27 3.6.1 Challenges for Nastaleeq OCR ..................... 28 3.6.2 Databases for Urdu Nastaleeq OCR .................. 33 3.7 Printed Documents with Multiple Scripts and Languages ......... 35 3.7.1 Categorization of Multilingual Documents .............. 36 3.7.2 OCR Datasets for Multilingual Documents .............. 36 3.8 Chapter Summary ................................. 38 4 Benchmark Datasets for OCR 39 4.1 Related Work ................................... 40 4.2 Semi-automated Database Generation .................... 41 4.2.1 Preprocessing .............................. 42 4.2.2 Ligature Extraction ........................... 42 4.2.3 Clustering ................................. 43 4.2.4 Ligature Labeling ............................. 44 4.2.5 Experiments and Results ........................ 45 4.2.6 Conclusion ................................ 46 4.3 Synthetic Text-Line Generation ......................... 46 4.4 OCR Databases .................................. 47 4.4.1 Deva-DB – OCR Database for Devanagari Script ........... 47 4.4.2 Polyton-DB – Greek Polytonic Database ............... 49 4.4.3 Databases for Multilingual OCR (MOCR) ............... 49 4.5 Chapter Summary ................................. 52 P art − II : OCR for Monolingual Documents 5 OCR for Printed Modern Scripts 53 5.1 Design of LSTM-Based OCR System ...................... 53 5.1.1 Network Parameters Selection ..................... 55 5.1.2 Features .................................. 57 5.1.3 Performance Metric ........................... 57 5.2 Printed English OCR ............................... 57 5.2.1 Related Work ............................... 58 5.2.2 Database ................................. 59 5.2.3 Experimental Evaluation and Results ................. 59 5.2.4 Error Analysis ............................... 62 5.2.5 Conclusions ................................ 62 5.3 Printed Devanagari OCR ............................. 63 5.3.1 Related Work
Recommended publications
  • The Black Arts Enterprise and the Production of African American Poetry
    0/-*/&4637&: *ODPMMBCPSBUJPOXJUI6OHMVFJU XFIBWFTFUVQBTVSWFZ POMZUFORVFTUJPOT UP MFBSONPSFBCPVUIPXPQFOBDDFTTFCPPLTBSFEJTDPWFSFEBOEVTFE 8FSFBMMZWBMVFZPVSQBSUJDJQBUJPOQMFBTFUBLFQBSU $-*$,)&3& "OFMFDUSPOJDWFSTJPOPGUIJTCPPLJTGSFFMZBWBJMBCMF UIBOLTUP UIFTVQQPSUPGMJCSBSJFTXPSLJOHXJUI,OPXMFEHF6OMBUDIFE ,6JTBDPMMBCPSBUJWFJOJUJBUJWFEFTJHOFEUPNBLFIJHIRVBMJUZ CPPLT0QFO"DDFTTGPSUIFQVCMJDHPPE The Black Arts Enterprise and the Production of African American Poetry The Black Arts Enterprise and the Production of African American Poetry Howard Rambsy II The University of Michigan Press • Ann Arbor First paperback edition 2013 Copyright © by the University of Michigan 2011 All rights reserved Published in the United States of America by The University of Michigan Press Manufactured in the United States of America c Printed on acid-free paper 2016 2015 2014 2013 5432 No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, or otherwise, without the written permission of the publisher. A CIP catalog record for this book is available from the British Library. Library of Congress Cataloging-in-Publication Data Rambsy, Howard. The black arts enterprise and the production of African American poetry / Howard Rambsy, II. p. cm. Includes bibliographical references and index. ISBN 978-0-472-11733-8 (cloth : acid-free paper) 1. American poetry—African American authors—History and criticism. 2. Poetry—Publishing—United States—History—20th century. 3. African Americans—Intellectual life—20th century. 4. African Americans in literature. I. Title. PS310.N4R35 2011 811'.509896073—dc22 2010043190 ISBN 978-0-472-03568-7 (pbk. : alk. paper) ISBN 978-0-472-12005-5 (e-book) Cover illustrations: photos of writers (1) Haki Madhubuti and (2) Askia M. Touré, Mari Evans, and Kalamu ya Salaam by Eugene B. Redmond; other images from Shutterstock.com: jazz player by Ian Tragen; African mask by Michael Wesemann; fist by Brad Collett.
    [Show full text]
  • The Effect of Fonts Design Features on OCR for Latin and Arabic
    Avestia Publishing Journal of Multimedia Theory and Application Volume 1, Year 2014 Journal ISSN: 2368-5956 DOI: 10.11159/jmta.2014.006 The Effect of Fonts Design Features on OCR for Latin and Arabic Jehan Janbi1,2, Mrouj Almuhajri2, Ching Y. Suen2 1Department of Computer Science, Taif University Airport Rd, Al Huwaya, Taif, Saudi Arabia [email protected] 2CENPARMI Concordia University, 1515 St. Catherine W. Montréal, QC, Canada [email protected]; [email protected] Abstract - Huge digitized book projects have been launched and handwritten characters. The improvement of recently like Google Book Search Library project and MLibrary. recognition can be effected by modification of OCR They basically depend on optical character recognition systems stages, such as pre-processing, segmentation, features (OCR) to convert scanned books or documents into editable and extraction or recognition stage. Several studies text searchable e-books. Although research on OCR area addressed different features to be extracted in purpose pursued over decades, very few of them focus on the effect of typeface design on the recognition rate. We took a further step of increasing OCR accuracy for Latin and Arabic scripts. by conducting systematically two observational experiments on In [1], a survey has been done on many studies that Latin and Arabic typefaces using OCR tools. A collection of 18 used different features extraction techniques for Latin, Latin and 13 Arabic typefaces have been tested in two sizes such as zoning, calculating moments and number of using six OCR packages in total. In addition, confusion tables holes and points of the character. For Arabic, in [2] have been constructed to show similarities among some more than one feature extraction method has been characters of the alphabet.
    [Show full text]
  • University of Florida Thesis Or Dissertation Formatting
    COLLECTIVITY AND FORM: POLITICS AND AESTHETICS OF COLLECTIVITY IN MICHAEL HARDT AND ANTONIO NEGRI, AMIRI BARAKA, AND THOMAS PYNCHON By JAMES LINER A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2013 1 © 2013 James Liner 2 To Ada, whose dad has been working on this dissertation all her life 3 ACKNOWLEDGMENTS First of all, I would like to thank my doctoral committee for their support, encouragement, and generous criticism throughout this project: my director, Phil Wegner; Susan Hegeman; Eric Kligerman; Barbara Mennel; and my off-campus reader, Tim Murphy. It was under Phil’s guidance that I began my serious study of the work of Fredric Jameson, while I have Tim to thank for introducing me to the thought of Michael Hardt and Antonio Negri as well as the literature of Amiri Baraka, not to mention introducing me to Hardt himself. In addition to thoughtful critiques offered by my committee, this dissertation has also benefited from comments, suggestions, questions, and conversations among several of my colleagues and close friends. Thanks go to Wes Beal, Aaron Cerny, Jordan Dominy, Andrew Gordon, Vince Leitch, Regina Martin, Mike Mayne, Emily McCann, Patrick McHenry, Matt Mingus, and Christina van Houten. I owe special thanks to Aaron and Andrew, who labored with me through a first reading of Thomas Pynchon’s Against the Day and commented on a very early version of what was to become the interpretation of that novel I offer in this dissertation.
    [Show full text]
  • Global Index with Hyperlinks to PDF Files
    Index A allography, 129, 134, 136, 232, 233 alphabet, 89, 144–148, 150, 152, abbreviation, 14, 15, 32, 33, 60–63, 154, 155, 157, 166, 168– 76, 77, 91, 269, 282, 378, 170, 172, 177, 180, 182, 514, 631, 719, 819, 946, 185–187, 218, 223–230, 1078 232, 234–236, 241, 242, marker, 15 244, 429, 432, 463, 473, abjad, 51, 53, 54, 60, 76, 134, 178, 569, 763, 765, 807, 810, 180, 228, 625, 781, 807– 811, 822, 825, 829, 831, 811, 820, 822, 910, 1109 837, 841, 888, 907, 908, Abkhaz, 115 910, 916, 1109, 1112 abstract, 561, 563, 565, 567 a monument to hidebound object, 9 conservatism, 114 abugida, 54, 162, 163, 165, 177, Arabic, 910 180, 184, 228, 625, 807– Aramaic, 166 811, 818–820, 822 Armenian, 166, 185, 186 acrophonic, 208, 232 Bougainvillian, 837 principle, 168, 224, 232 Carian, 181, 182, 232, 233 acrostics, 1109, 1111–1113, 1116 consonantal, 208, 218, 810, Adobe, 441 811 Aegean, 807, 809 Coptic, 184, 775, 776, 778, 780 Aidarus, 980 Cyrillic, 115, 154, 1068, 1070, Ajami, 971 1080 Akkadian, 59, 116, 117, 216, 218, dual, 14 929, 1112 English, 825 al-Busiri, 980 Entlehnungs-, 150 al-Inkishafi, 977 Etruscan, 170, 171, 916 Albanian, 166, 373, 392 Georgian, 793 aleph, 466, 1114, 1115 Gothic, 166, 224 Algerian (typeface), 397 Greek, 115, 167–171, 182, 187, Algernon, 276, 278 225, 226, 230–232, 234, Alice in Wonderland, 350, 351 775–779, 781 alloglottoepy, 790 Hebrew, 231, 497 alloglottography, 790 history, 157 βʹ Index Korean, 807 476, 478, 479, 497, 498, Latin, 147, 151, 154, 171, 176, 530, 564, 568, 580, 622, 182, 300, 305, 443, 445, 625, 760, 810, 835,
    [Show full text]
  • Beyond Trivial Counterfactual Generations with Diverse Valuable Explanations
    Under review as a conference paper at ICLR 2021 BEYOND TRIVIAL COUNTERFACTUAL GENERATIONS WITH DIVERSE VALUABLE EXPLANATIONS Anonymous authors Paper under double-blind review ABSTRACT Explainability of machine learning models has gained considerable attention within our research community given the importance of deploying more reliable machine-learning systems. Explanability can also be helpful for model debugging. In computer vision applications, most methods explain models by displaying the regions in the input image that they focus on for their prediction, but it is dif- ficult to improve models based on these explanations since they do not indicate why the model fail. Counterfactual methods, on the other hand, indicate how to perturb the input to change the model prediction, providing details about the model’s decision-making. Unfortunately, current counterfactual methods make ambiguous interpretations as they combine multiple biases of the model and the data in a single counterfactual interpretation of the model’s decision. Moreover, these methods tend to generate trivial counterfactuals about the model’s decision, as they often suggest to exaggerate or remove the presence of the attribute be- ing classified. Trivial counterfactuals are usually not valuable, since the informa- tion they provide is often already known to the system’s designer. In this work, we propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss to uncover mul- tiple valuable explanations about the model’s prediction. Further, we introduce a mechanism to prevent the model from producing trivial explanations. Experi- ments on CelebA and Synbols demonstrate that our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
    [Show full text]
  • Adaptive Text Recognition Through Visual Matching Supplementary Material
    Adaptive Text Recognition through Visual Matching Supplementary Material Chuhan Zhang1, Ankush Gupta2, and Andrew Zisserman1 1 Visual Geometry Group,Department of Engineering Science University of Oxford fczhang,[email protected] 2 DeepMind, London [email protected] Contents Evaluation Metrics Section1 Ablation Study Section2 Examples and Visualizations Section3 Implementation of SOTA Models Section4 Performance on Scene Text Section5 Dataset Details Section6 Code, data, and model checkpoints are available at: http://www.robots.ox. ac.uk/~vgg/research/FontAdaptor20/. 1 Evaluation Metrics We measure the character (CER) and word error rates (WER): N EditDist y(i); y(i) 1 X gt pred CER = N (i) i=1 Length ygt (i) (i) th where, ygt and ypred are the i ground-truth and predicted strings respectively in a dataset containing N strings; EditDist is the Levenshtein distance [10]; Length (ygt) is the number of characters in ygt. WER is computed as CER above with words (i.e. contiguous characters separated by whitespace) in place of characters as tokens. 2 C.Zhang et al. 2 Ablation study 2.1 Ablation on modules at a larger scale In section 5.4 of the main paper, we ablated each major component of the pro- posed model architecture to evaluate its relative contribution to the model's performance. However, there the training set was limited to only one FontSynth attribute (regular fonts). Here in table1, we ablate the same model components, but couple it with increasing number of training fonts from the FontSynth dataset (section 5.2 of the main paper), still evaluating on the FontSynth test set.
    [Show full text]
  • Automating the Generation and Typesetting of Arabic Script
    AUTOMATING THE GENERATION AND TYPESETTING OF ARABIC SCRIPT by Sherif Samir Hassan Mansour A Thesis Submitted to the Faculty of Engineering at Cairo University in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE in Electronics and Communications Engineering FACULTY OF ENGINEERING, CAIRO UNIVERSITY GIZA, EGYPT 2015 AUTOMATING THE GENERATION AND TYPESETTING OF ARABIC SCRIPT by Sherif Samir Hassan Mansour A Thesis Submitted to the Faculty of Engineering at Cairo University in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE in Electronics and Communications Engineering Under the Supervision of Dr. Hossam A. H. Fahmy Associate Professor Electronics and Communications Engineering Department Faculty of Engineering, Cairo University FACULTY OF ENGINEERING, CAIRO UNIVERSITY GIZA, EGYPT 2015 AUTOMATING THE GENERATION AND TYPESETTING OF ARABIC SCRIPT by Sherif Samir Hassan Mansour A Thesis Submitted to the Faculty of Engineering at Cairo University in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE in Electronics and Communications Engineering Approved by the Examination Committee Associate Prof. Hossam A. H. Fahmy, Thesis Advisor Prof. Mohsen AbdelRazik Rashwan, Internal Member Prof. El-Sayed Mostafa Saad, External Member (Professor at the Faculty of Engineering, Helwan University) FACULTY OF ENGINEERING, CAIRO UNIVERSITY GIZA, EGYPT 2015 Engineer: Sherif Samir Hassan Mansour Date of Birth: 09/02/1986 Nationality: Egyptian E-mail: [email protected] Phone: +201227120033 Address: Building 16, Group 87, Al-Rehab City, Cairo Registration Date: 1/10/2010 Awarding Date: / / Degree: Master of Science Department: Electronics and Communications Engineering Supervisors: Associate Prof. Hossam A. H. Fahmy Examiners: Associate Prof.
    [Show full text]
  • Pilcrows from Google Fonts.Pdf
    True Reversed Pilcrows Arimo Cantarell Cardo EB Garamond Noto Sans Noto Serif ⁋ ⁋ ⁋ ⁋ ⁋ ⁋ Nova Mono Roboto Roboto Condensed Tinos Vollkorn ⁋ ⁋ ⁋ ⁋ ⁋ Mirrored Regular Pilcrows ABeeZee Abel Abhaya Libre Abril Fatface Aclonica Acme ¶ ¶ ¶ ¶ ¶ ¶ Actor Adamina Advent Pro Aguafina Script Akronim Aladin ¶ ¶ ¶ ¶ ¶ ¶ Alata Alatsi Aldrich Alegreya Alegreya Sans Aleo ¶ ¶ ¶ ¶ ¶ ¶ Alex Brush Alfa Slab One Alice Alike Alike Angular Allan ¶ ¶ ¶ ¶ ¶ ¶ Allura Almarai Almendra Almendra Display Almendra SC Amarante ¶ ¶ ¶ ¶ ¶ ¶ Amaranth Amatic SC Amethysta Amiko Amiri Amita ¶ ¶ ¶ ¶ ¶ ¶ Anaheim Andada Andika Annie Use Your Anonymous Pro Antic Telescope ¶ ¶ ¶ ¶ ¶ ¶ Antic Didone Antic Slab Anton Arapey Arbutus Arbutus Slab ¶ ¶ ¶ ¶ ¶ ¶ Architects Daughter Archivo Archivo Black Archivo Narrow Aref Ruqaa Arima Madurai ¶ ¶ ¶ ¶ ¶ ¶ Arizonia Armata Arsenal Artifika Arvo Arya ¶ ¶ ¶ ¶ ¶ ¶ Asap Asar Asset Assistant Astloch Asul ¶ ¶ ¶ ¶ ¶ ¶ Athiti Atma Atomic Age Aubrey Audiowide Autour One ¶ ¶ ¶ ¶ ¶ ¶ Average Average Sans Averia Gruesa Libre Averia Libre Averia Sans Libre Averia Serif Libre ¶ ¶ ¶ ¶ ¶ ¶ B612 B612 Mono Bad Script Bahiana Bahianita Bai Jamjuree ¶ ¶ ¶ ¶ ¶ Baloo Baloo Bhai Baloo Bhaijaan ¶ Baloo Bhaina Baloo Chettan Baloo Da ¶ ¶ ¶ ¶ ¶ ¶ Baloo Paaji Baloo Tamma Baloo Tammudu Baloo Thambi Balthazar Bangers ¶ ¶ ¶ ¶ ¶ Barlow Barlow Condensed Barriecito ¶ Basic Baskervville Baumans ¶ ¶ ¶ ¶ ¶ ¶ Be Vietnam Bebas Neue Belgrano Bellefair Belleza BenchNine ¶ ¶ ¶ ¶ ¶ ¶ Berkshire Swash Bevan Big Shoulders Display Big Shoulders Text Bigelow Rules Bigshot One ¶ ¶ ¶ ¶ ¶
    [Show full text]
  • The Amiri Typeface
    12 TUGboat, Volume 33 (2012), No. 1 The Amiri typeface Khaled Hosny In 1905, the famous Bulaq printing press in Cairo (also known as al-Amiriya, the royal press) issued Sample from Kalilah wa Dimnah, Bulaq, 1938. a new Arabic typeface as part of reviving the then moribund printing press. This typeface later came to be one of the most widely used and highly regarded أبجد هوز حطي كلمن سعفص قرشت ثخذ ضظغ ٩٨٧٦٥٤٣٢١٠٪ .Arabic typefaces, even in the digital era أبجد هوز حطي كلمن سعفص قرشت ثخذ ضظغ ٩٨٧٦٥٤٣٢١٠٪ Arabic has strong calligraphic traditions with many styles, and Naskh (\to copy") is the most أبجد هوز حطي كلمن سعفص قرشت ثخذ ضظغ ٩٨٧٦٥٤٣٢١٠٪ commonly used style in typesetting. One of the most أبجد هوز حطي كلمن سعفص قرشت ثخذ ضظغ ٩٨٧٦٥٤٣٢١٠٪ novel features of the Bulaq typeface is maintaining the æsthetics of Naskh calligraphy while meeting The four styles in the Amiri family. the requirements (and limitations) of typesetting, a balance that is not easily achieved. Amiri is a revival of that typeface, and though it ِصْف خَلَْق َخوٍْد كَ ِمِثْل َٱلّش ِْمس ِإ ْذ بَزَغَْت is not the first one, I believe it is the most elaborate and most complete, as all other revivals omit many َيحْظَى َٱلّض ِجيُع بِهَا َنجْلَاء مِعْطَارِ -of the letter forms in the metal type either for sim ٰ َ plicity or limitations of early digital systems. On the Arabic pangram set in Amiri. other hand, features that are merely a result of the limitations of metal typesetting are not reproduced in Amiri, when appropriate.
    [Show full text]
  • TUGBOAT Volume 40, Number 1 / 2019
    TUGBOAT Volume 40, Number 1 / 2019 General Delivery 3 From the president / Boris Veytsman 4 Editorial comments / Barbara Beeton A memorial for SPQR; Project support from UK-TUG and TUG; Installing historic TEX Live on Unix; Converting images to LATEX: mathpix.com; Fonts, fonts, fonts! (Helvetica Now, Study, Public Sans, Brill diacritics, Berlin typography) 5 Noob to Ninja: The challenge of taking beginners’ needs into account when teaching LATEX / Sarah Lang and Astrid Schm¨olzer Tutorials 10 The DuckBoat — News from TEX.SE: Processing text files to get LATEX tables / Carla Maggi Accessibility 14 No hands — the dictation of LATEX / Mike Roberts 17 Nemeth braille math and LATEX source as braille / Susan Jolly Software & Tools 22 Both TEX and DVI viewers inside the web browser / Jim Fowler 25 Markdown 2.7.0: Towards lightweight markup in TEX / V´ıt Novotn´y 28 New front ends for TEX Live / Siep Kroonenberg 30 TinyTeX: A lightweight, cross-platform, and easy-to-maintain LATEX distribution based on TEX Live / Yihui Xie 33 Extending primitive coverage across engines / Joseph Wright 34 ConTEXt LMTX / Hans Hagen 38 Bringing world scripts to LuaTEX: The HarfBuzz experiment / Khaled Hosny A L TEX 44 LATEX news, issue 29, December 2018 / LATEX Project Team 47 Glossaries with bib2gls / Nicola Talbot 61 TEX.StackExchange cherry picking, part 2: Templating / Enrico Gregorio 69 Real number calculations in LATEX: Packages / Joseph Wright Macros 71 Real number calculations in TEX: Implementations and performance / Joseph Wright Electronic Documents 76
    [Show full text]
  • The Arabluatex Package V1.20 – 2020/03/23
    The arabluatex package v1.20 – 2020/03/23 Robert Alessi [email protected] Contents License and disclamer 3 4.3 Special orthographies . 21 The name of God 21 The 21 ا ّلََ يِذ Introduction 3 conjunctive 1 1.1 arabluatex is for LuaLATEX..... 5 4.4 Quoting . 22 novoc 22 voc 23 fullvoc 2 The basics of arabluatex 5 23 2.1 Activating arabluatex ........ 5 4.4.1 Quoting the hamzah .... 23 Font setup 5 4.5 The ‘pipe’ character (|) . 24 2.2 Options . 6 4.6 Putting back on broken contextual 2.2.1 Classic contrasted with analysis rules . 24 modern typesetting of Arabic 6 4.7 Stretching characters: the taṭwīl . 26 2.3 Typing Arabic . 8 4.8 Digits . 26 2.3.1 Local options . 9 4.8.1 Numerical figures . 26 4.8.2 The abjad ......... 26 3 Standard ArabTEX input 9 4.9 Additional characters . 27 3.1 Consonants . 9 4.10 Arabic emphasis . 28 3.2 Additional characters . 11 4.10.1 Underlining words or num- 3.3 Vowels . 11 bers . 28 3.3.1 Long vowels . 11 3.3.2 Short vowels . 12 5 Arabic poetry 28 Scaling and distortion of 4 arabluatex in action 13 characters 31 Footnotes 31 4.1 The vowels and diphthongs . 13 Line numbering 31 Short vowels 13 Long vow- 5.1 Example . 32 els 13 ʾalif maqṣūrah 13 ʾalif otiosum 14 ʾalif maḥḏū- 6 Special applications 33 fah and defective ū, ī 14 Linguistics 33 Brackets 33 ʿAmr un, and Additional Arabic marks 34 14 ي/و Silent -tanwīn 14 The ‘Zero width joiner’ char 14 و the silent 4.2 Other orthographic signs .
    [Show full text]
  • Mirrored Pilcrows from Google Fonts.Pdf
    ABeeZee Abel Abhaya Libre Abril Fatface Aclonica Acme ¶ ¶ ¶ ¶ ¶ ¶ Actor Adamina Advent Pro Aguafina Script Akronim Aladin ¶ ¶ ¶ ¶ ¶ ¶ Alata Alatsi Aldrich Alegreya Alegreya Sans Aleo ¶ ¶ ¶ ¶ ¶ ¶ Alex Brush Alfa Slab One Alice Alike Alike Angular Allan ¶ ¶ ¶ ¶ ¶ ¶ Allura Almarai Almendra Almendra Display Almendra SC Amarante ¶ ¶ ¶ ¶ ¶ ¶ Amaranth Amatic SC Amethysta Amiko Amiri Amita ¶ ¶ ¶ ¶ ¶ Anaheim Andada Andika Annie Use Your Anonymous Pro Antic ¶ Telescope ¶ ¶ ¶ ¶ ¶ ¶ Antic Didone Antic Slab Anton Arapey Arbutus Arbutus Slab ¶ ¶ ¶ ¶ ¶ ¶ Architects Daughter Archivo Archivo Black Archivo Narrow Aref Ruqaa Arima Madurai ¶ ¶ ¶ ¶ ¶ ¶ Arimo Arizonia Armata Arsenal Artifika Arvo ¶ ¶ ¶ ¶ ¶ ¶ Arya Asap Asar Asset Assistant Astloch ¶ ¶ ¶ ¶ ¶ ¶ Asul Athiti Atma Atomic Age Aubrey Audiowide ¶ ¶ ¶ ¶ ¶ ¶ Autour One Average Average Sans Averia Gruesa Libre Averia Libre Averia Sans Libre ¶ ¶ ¶ ¶ ¶ ¶ Averia Serif Libre B612 B612 Mono Bad Script Bahiana Bahianita ¶ ¶ ¶ ¶ ¶ Bai Jamjuree Baloo Baloo Bhai Baloo Bhaijaan ¶ Baloo Bhaina Baloo Chettan ¶ ¶ ¶ ¶ ¶ Baloo Da Baloo Paaji Baloo Tamma Baloo Tammudu Baloo Thambi ¶ Balthazar ¶ ¶ ¶ ¶ ¶ Bangers Barlow Barlow Condensed Barriecito ¶ Basic Baskervville ¶ ¶ ¶ ¶ ¶ ¶ Baumans Be Vietnam Bebas Neue Belgrano Bellefair Belleza ¶ ¶ ¶ ¶ ¶ ¶ BenchNine Berkshire Swash Bevan Big Shoulders Display Big Shoulders Text Bigelow Rules ¶ ¶ ¶ ¶ ¶ ¶ Bigshot One Bilbo BioRhyme BioRhyme Expanded Biryani Bitter ¶ ¶ ¶ ¶ ¶ ¶ Black Ops One Blinker Bonbon Boogaloo Bowlby One Bowlby One SC ¶ ¶ ¶ ¶ ¶ ¶ Brawler Bree Serif Bubblegum Sans
    [Show full text]